Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.01482
Cited By
v1
v2
v3
v4 (latest)
Masked Mixers for Language Generation and Retrieval
2 September 2024
Benjamin L. Badger
Re-assign community
ArXiv (abs)
PDF
HTML
Github (5★)
Papers citing
"Masked Mixers for Language Generation and Retrieval"
28 / 28 papers shown
Know Your Limits: Entropy Estimation Modeling for Compression and Generalization
Benjamin L. Badger
Matthew Neligeorge
170
1
0
13 Nov 2025
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Loubna Ben Allal
Anton Lozhkov
Elie Bakouch
Gabriel Martín Blázquez
Guilherme Penedo
...
Cyril Zakka
Mathieu Morlon
Colin Raffel
Leandro von Werra
Thomas Wolf
MoE
613
208
0
04 Feb 2025
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
513
744
0
25 Jun 2024
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle
Badr Youbi Idrissi
Baptiste Rozière
David Lopez-Paz
Gabriele Synnaeve
349
262
0
30 Apr 2024
Improving Text Embeddings with Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
SyDa
607
337
0
31 Dec 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
809
6,333
0
01 Dec 2023
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Neural Information Processing Systems (NeurIPS), 2023
Daniel Y. Fu
Simran Arora
Jessica Grogan
Isys Johnson
Sabri Eyuboglu
Armin W. Thomas
Benjamin Spector
Michael Poli
Atri Rudra
Christopher Ré
MoE
202
70
0
18 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
12.2K
16,310
0
18 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
International Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
620
2,426
0
17 Jul 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
463
435
0
12 May 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
International Conference on Machine Learning (ICML), 2023
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
652
466
0
21 Feb 2023
Why Deep Learning Generalizes
Benjamin L. Badger
TDI
AI4CE
168
4
0
17 Nov 2022
Depth and Representation in Vision Models
Benjamin L. Badger
SSL
VLM
FAtt
175
3
0
11 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
1.0K
2,869
0
09 Nov 2022
Small Language Models for Tabular Data
Benjamin L. Badger
LMTD
231
2
0
05 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
731
789
0
24 Sep 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
243
35
0
09 Feb 2022
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
543
440
0
06 Oct 2021
Invertible Attention
Jiajun Zha
Yiran Zhong
Jing Zhang
Leonid Sigal
Liang Zheng
219
7
0
16 Jun 2021
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet
Luke Melas-Kyriazi
ViT
195
116
0
06 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Neural Information Processing Systems (NeurIPS), 2021
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
1.4K
3,468
0
04 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
1.2K
4,768
0
20 Apr 2021
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Neural Information Processing Systems (NeurIPS), 2019
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
1.1K
50,986
0
03 Dec 2019
A mathematical theory of semantic development in deep neural networks
Andrew M. Saxe
James L. McClelland
Surya Ganguli
234
320
0
23 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
3.1K
112,756
0
11 Oct 2018
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
8.3K
171,167
0
12 Jun 2017
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Payal Bajaj
Daniel Fernando Campos
Nick Craswell
Li Deng
Jianfeng Gao
...
Mir Rosenberg
Xia Song
Alina Stoica
Saurabh Tiwary
Tong Wang
RALM
971
3,299
0
28 Nov 2016
Understanding Deep Image Representations by Inverting Them
Computer Vision and Pattern Recognition (CVPR), 2014
Aravindh Mahendran
Andrea Vedaldi
FAtt
691
2,060
0
26 Nov 2014
1
Page 1 of 1