1 posts
Fresh arXiv work on separating LLM streams and recasting transformers as GEMM programs points to incremental efficiency gains that engineers