ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17902
  4. Cited By
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
v1v2 (latest)

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

27 February 2024
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
ArXiv (abs)PDFHTML

Papers citing "SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization"

36 / 36 papers shown
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev
Eldar Kurtic
Eugenia Iofinova
Elias Frantar
Alexandra Peste
Dan Alistarh
VLM
335
14
0
03 Aug 2023
PDP: Parameter-free Differentiable Pruning is All You Need
PDP: Parameter-free Differentiable Pruning is All You NeedNeural Information Processing Systems (NeurIPS), 2023
Minsik Cho
Saurabh N. Adya
Devang Naik
VLM
250
15
0
18 May 2023
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
Fast as CHITA: Neural Network Pruning with Combinatorial OptimizationInternational Conference on Machine Learning (ICML), 2023
Riade Benbaki
Wenyu Chen
X. Meng
Hussein Hazimeh
Natalia Ponomareva
Zhe Zhao
Rahul Mazumder
295
40
0
28 Feb 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-ShotInternational Conference on Machine Learning (ICML), 2023
Elias Frantar
Dan Alistarh
VLM
587
1,046
0
02 Jan 2023
Are Straight-Through gradients and Soft-Thresholding all you need for
  Sparse Training?
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
A. Vanderschueren
Christophe De Vleeschouwer
MQ
154
14
0
02 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-ExpertsConference on Machine Learning and Systems (MLSys), 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
206
160
0
29 Nov 2022
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision ModelsNeural Information Processing Systems (NeurIPS), 2022
Denis Kuznedelev
Eldar Kurtic
Elias Frantar
Dan Alistarh
VLMViT
174
21
0
14 Oct 2022
Sequential Attention for Feature Selection
Sequential Attention for Feature SelectionInternational Conference on Learning Representations (ICLR), 2022
T. Yasuda
M. Bateni
Lin Chen
Matthew Fahrbach
Gang Fu
Vahab Mirrokni
342
12
0
29 Sep 2022
Hardness and Algorithms for Robust and Sparse Optimization
Hardness and Algorithms for Robust and Sparse OptimizationInternational Conference on Machine Learning (ICML), 2022
Eric Price
Sandeep Silwal
Samson Zhou
234
10
0
29 Jun 2022
Sparse-Group Log-Sum Penalized Graphical Model Learning For Time Series
Sparse-Group Log-Sum Penalized Graphical Model Learning For Time SeriesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jitendra Tugnait
212
9
0
29 Apr 2022
Iterative Hard Thresholding with Adaptive Regularization: Sparser
  Solutions Without Sacrificing Runtime
Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing RuntimeInternational Conference on Machine Learning (ICML), 2022
Kyriakos Axiotis
M. Sviridenko
131
14
0
11 Apr 2022
Data-Efficient Structured Pruning via Submodular Optimization
Data-Efficient Structured Pruning via Submodular OptimizationNeural Information Processing Systems (NeurIPS), 2022
Marwa El Halabi
Suraj Srinivas
Damien Scieur
407
22
0
09 Mar 2022
OptG: Optimizing Gradient-driven Criteria in Network Sparsity
OptG: Optimizing Gradient-driven Criteria in Network Sparsity
Yuxin Zhang
Mingbao Lin
Mengzhao Chen
Jiayi Ji
Rongrong Ji
471
7
0
30 Jan 2022
Powerpropagation: A sparsity inducing weight reparameterisation
Powerpropagation: A sparsity inducing weight reparameterisation
Jonathan Richard Schwarz
Siddhant M. Jayakumar
Razvan Pascanu
P. Latham
Yee Whye Teh
383
58
0
01 Oct 2021
S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN
  Acceleration
S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN AccelerationInternational Symposium on High-Performance Computer Architecture (HPCA), 2021
Zhi-Gang Liu
P. Whatmough
Yuhao Zhu
Matthew Mattina
MQ
197
102
0
16 Jul 2021
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
  Networks
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks
Alexandra Peste
Eugenia Iofinova
Adrian Vladu
Dan Alistarh
AI4CE
596
77
0
23 Jun 2021
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
Sparse Training via Boosting Pruning Plasticity with NeuroregenerationNeural Information Processing Systems (NeurIPS), 2021
Shiwei Liu
Tianlong Chen
Xiaohan Chen
Zahra Atashgahi
Lu Yin
Huanyu Kou
Li Shen
Mykola Pechenizkiy
Zinan Lin
Decebal Constantin Mocanu
332
133
0
19 Jun 2021
The Fine-Grained Hardness of Sparse Linear Regression
The Fine-Grained Hardness of Sparse Linear Regression
A. Gupte
Vinod Vaikuntanathan
195
14
0
06 Jun 2021
Operation-Aware Soft Channel Pruning using Differentiable Masks
Operation-Aware Soft Channel Pruning using Differentiable MasksInternational Conference on Machine Learning (ICML), 2020
Minsoo Kang
Bohyung Han
AAML
197
160
0
08 Jul 2020
Sparse Convex Optimization via Adaptively Regularized Hard Thresholding
Sparse Convex Optimization via Adaptively Regularized Hard Thresholding
Kyriakos Axiotis
M. Sviridenko
264
19
0
25 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
386
557
0
15 May 2020
Winning the Lottery with Continuous Sparsification
Winning the Lottery with Continuous SparsificationNeural Information Processing Systems (NeurIPS), 2019
Pedro H. P. Savarese
Hugo Silva
Michael Maire
383
150
0
10 Dec 2019
Rigging the Lottery: Making All Tickets Winners
Rigging the Lottery: Making All Tickets WinnersInternational Conference on Machine Learning (ICML), 2019
Utku Evci
Trevor Gale
Jacob Menick
Pablo Samuel Castro
Erich Elsen
537
686
0
25 Nov 2019
Implicit Regularization for Optimal Sparse Recovery
Implicit Regularization for Optimal Sparse RecoveryNeural Information Processing Systems (NeurIPS), 2019
Tomas Vaskevicius
Varun Kanade
Patrick Rebeschini
177
112
0
11 Sep 2019
Differentiable Mask for Pruning Convolutional and Recurrent Networks
Differentiable Mask for Pruning Convolutional and Recurrent NetworksCanadian Conference on Computer and Robot Vision (CRV), 2019
R. Ramakrishnan
Eyyub Sari
V. Nia
VLM
177
17
0
10 Sep 2019
Deep Learning Recommendation Model for Personalization and
  Recommendation Systems
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov
Dheevatsa Mudigere
Hao-Jun Michael Shi
Jianyu Huang
Narayanan Sundaraman
...
Wenlin Chen
Vijay Rao
Bill Jia
Liang Xiong
M. Smelyanskiy
253
844
0
31 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be PrunedAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
742
1,345
0
23 May 2019
DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
788
4,746
0
24 Jun 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
987
3,927
0
09 Mar 2018
Attribution Modeling Increases Efficiency of Bidding in Display
  Advertising
Attribution Modeling Increases Efficiency of Bidding in Display Advertising
Eustache Diemert
Julien Meynet
Pierre Galland
Damien Lefortier
196
57
0
20 Jul 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.3K
162,388
0
12 Jun 2017
Restricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity
Ethan R. Elenberg
Rajiv Khanna
A. Dimakis
S. Negahban
275
162
0
02 Dec 2016
Lasso, fractional norm and structured sparse estimation using a Hadamard
  product parametrization
Lasso, fractional norm and structured sparse estimation using a Hadamard product parametrization
P. Hoff
390
58
0
31 Oct 2016
Learning Structured Sparsity in Deep Neural Networks
Learning Structured Sparsity in Deep Neural Networks
W. Wen
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
534
2,456
0
12 Aug 2016
Structured Pruning of Deep Convolutional Neural Networks
Structured Pruning of Deep Convolutional Neural Networks
S. Anwar
Kyuyeon Hwang
Wonyong Sung
338
789
0
29 Dec 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural NetworksNeural Information Processing Systems (NeurIPS), 2015
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
567
7,332
0
08 Jun 2015
1
Page 1 of 1