-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 16 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 72 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 22 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
Aneta Melisa Stal
melisa
AI & ML interests
NLP
Organizations
Collections
1
models
3
datasets
None public yet