ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00595
  4. Cited By
Monarch: Expressive Structured Matrices for Efficient and Accurate
  Training

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

1 April 2022
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
ArXivPDFHTML

Papers citing "Monarch: Expressive Structured Matrices for Efficient and Accurate Training"

50 / 66 papers shown
Title
Block Circulant Adapter for Large Language Models
Block Circulant Adapter for Large Language Models
Xinyu Ding
Meiqi Wang
Siyu Liao
Zhongfeng Wang
31
0
0
01 May 2025
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han
Yuan Tang
Jinfeng Xu
Xianzhi Li
51
0
0
24 Mar 2025
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
48
0
0
31 Jan 2025
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep
  Neural Network Inference
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
25
0
0
28 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor
  Factorization for Compression of Generative Language Models
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models
Mingxue Xu
Sadia Sharmin
Danilo P. Mandic
22
2
0
03 Oct 2024
Searching for Efficient Linear Layers over a Continuous Space of
  Structured Matrices
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Andres Potapczynski
Shikai Qiu
Marc Finzi
Christopher Ferri
Zixi Chen
Micah Goldblum
Bayan Bruss
Christopher De Sa
Andrew Gordon Wilson
32
1
0
03 Oct 2024
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Gaurav Patel
Christopher Sandino
Behrooz Mahasseni
Ellen L. Zippi
Erdrin Azemi
Ali Moin
Juri Minxha
TTA
AI4TS
42
3
0
03 Oct 2024
Two Sparse Matrices are Better than One: Sparsifying Neural Networks
  with Double Sparse Factorization
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Vladimír Boža
Vladimír Macko
25
0
0
27 Sep 2024
Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks
Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks
Ashwin Samudre
Mircea Petrache
Brian D. Nord
Shubhendu Trivedi
42
2
0
18 Sep 2024
MoRe Fine-Tuning with 10x Fewer Parameters
MoRe Fine-Tuning with 10x Fewer Parameters
Wenxuan Tan
Nicholas Roberts
Tzu-Heng Huang
Jitian Zhao
John Cooper
Samuel Guo
Chengyu Duan
Frederic Sala
23
0
0
30 Aug 2024
Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for
  Transformer Pretraining
Mixed Sparsity Training: Achieving 4×\times× FLOP Reduction for Transformer Pretraining
Pihe Hu
Shaolong Li
Longbo Huang
26
0
0
21 Aug 2024
Unlocking Tokens as Data Points for Generalization Bounds on Larger
  Language Models
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi
Yilun Kuang
Brandon Amos
Micah Goldblum
Marc Finzi
Andrew Gordon Wilson
24
7
0
25 Jul 2024
Hydra: Bidirectional State Space Models Through Generalized Matrix
  Mixers
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang
Aakash Lahoti
Tri Dao
Albert Gu
Mamba
62
12
0
13 Jul 2024
Building on Efficient Foundations: Effectively Training LLMs with
  Structured Feedforward Layers
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
22
0
0
24 Jun 2024
An Empirical Investigation of Matrix Factorization Methods for
  Pre-trained Transformers
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta
Sina Mahdipour Saravani
P. Sadayappan
Vivek Srikumar
24
2
0
17 Jun 2024
Group and Shuffle: Efficient Structured Orthogonal Parametrization
Group and Shuffle: Efficient Structured Orthogonal Parametrization
Mikhail Gorbunov
Nikolay Yudin
Vera Soboleva
Aibek Alanov
Alexey Naumov
Maxim Rakhuba
29
1
0
14 Jun 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
32
11
0
10 Jun 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
41
42
0
15 Apr 2024
Adaptive Patching for High-resolution Image Segmentation with
  Transformers
Adaptive Patching for High-resolution Image Segmentation with Transformers
Enzhi Zhang
Isaac Lyngaas
Peng Chen
Xiao Wang
Jun Igarashi
Yuankai Huo
M. Wahib
M. Munetomo
MedIm
27
1
0
15 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and
  Channel Selection
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
39
31
0
29 Mar 2024
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large
  Language Model
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model
Haoyun Xu
Runzhe Zhan
Derek F. Wong
Lidia S. Chao
24
3
0
18 Mar 2024
Spiking Wavelet Transformer
Spiking Wavelet Transformer
Yuetong Fang
Ziqing Wang
Lingfeng Zhang
Jiahang Cao
Honglei Chen
Renjing Xu
59
4
0
17 Mar 2024
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Vithursan Thangarasa
Mahmoud Salem
Shreyas Saxena
Kevin Leong
Joel Hestness
Sean Lie
MedIm
32
1
0
01 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
40
6
0
28 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative
  Language Model Inference
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
18
27
0
09 Feb 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
40
140
0
11 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
21
0
01 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
22
0
0
01 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep
  Neural Networks
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
29
0
0
30 Nov 2023
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu
Zeju Qiu
Yao Feng
Yuliang Xiu
Yuxuan Xue
...
Songyou Peng
Yandong Wen
Michael J. Black
Adrian Weller
Bernhard Schölkopf
48
56
0
10 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
36
29
0
10 Nov 2023
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank
  Matrices
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices
Tetiana Parshakova
Trevor Hastie
Eric Darve
Stephen P. Boyd
11
1
0
30 Oct 2023
Differentiable Learning of Generalized Structured Matrices for Efficient
  Deep Neural Networks
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks
Changwoo Lee
Hun-Seok Kim
28
3
0
29 Oct 2023
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs
Jean Kossaifi
Nikola B. Kovachki
Kamyar Azizzadenesheli
Anima Anandkumar
AI4CE
42
32
0
29 Sep 2023
InRank: Incremental Low-Rank Learning
InRank: Incremental Low-Rank Learning
Jiawei Zhao
Yifei Zhang
Beidi Chen
F. Schafer
Anima Anandkumar
25
7
0
20 Jun 2023
Does a sparse ReLU network training problem always admit an optimum?
Does a sparse ReLU network training problem always admit an optimum?
Quoc-Tung Le
E. Riccietti
Rémi Gribonval
14
2
0
05 Jun 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
  Language Model
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
V. Chaudhary
Shuai Xu
Xia Hu
30
11
0
24 May 2023
Cuttlefish: Low-Rank Model Training without All the Tuning
Cuttlefish: Low-Rank Model Training without All the Tuning
Hongyi Wang
Saurabh Agarwal
Pongsakorn U-chupala
Yoshiki Tanaka
Eric P. Xing
Dimitris Papailiopoulos
OffRL
56
21
0
04 May 2023
Sparsity in neural networks can improve their privacy
Antoine Gonon
Léon Zheng
Clément Lalanne
Quoc-Tung Le
Guillaume Lauga
Can Pouliquen
37
2
0
20 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
30
4
0
15 Apr 2023
Can sparsity improve the privacy of neural networks?
Can sparsity improve the privacy of neural networks?
Antoine Gonon
Léon Zheng
Clément Lalanne
Quoc-Tung Le
Guillaume Lauga
Can Pouliquen
16
0
0
11 Apr 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient
  Vision Transformers
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
40
14
0
24 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training
  Efficiency
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
21
3
0
21 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
16
18
0
18 Mar 2023
Learning to Grow Pretrained Models for Efficient Transformer Training
Learning to Grow Pretrained Models for Efficient Transformer Training
Peihao Wang
Rameswar Panda
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zhangyang Wang
Yoon Kim
28
53
0
02 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
17
284
0
21 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
11
51
0
13 Feb 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook
  for Sparse Neural Network Researchers
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
27
30
0
06 Feb 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
53
368
0
28 Dec 2022
12
Next