Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.11525
Cited By
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
21 March 2023
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency"
15 / 15 papers shown
Title
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
33
0
0
13 Oct 2024
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
M. Emani
Sam Foreman
Varuni K. Sastry
Zhen Xie
Siddhisanket Raskar
William Arnold
R. Thakur
V. Vishwanath
M. Papka
ELM
11
9
0
06 Oct 2023
Dynamic Sparse Training with Structured Sparsity
Mike Lasby
A. Golubeva
Utku Evci
Mihai Nica
Yani Andrew Ioannou
19
7
0
03 May 2023
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training
Tianlong Chen
Zhenyu (Allen) Zhang
Pengju Wang
Santosh Balachandra
Haoyu Ma
Zehao Wang
Zhangyang Wang
OOD
AAML
74
46
0
20 Feb 2022
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
184
1,148
0
05 Oct 2021
Initialization and Regularization of Factorized Neural Layers
M. Khodak
Neil A. Tenenholtz
Lester W. Mackey
Nicolò Fusi
63
56
0
03 May 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
41
89
0
16 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
262
955
0
27 Jan 2021
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
X. Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian-jun Sun
114
1,484
0
11 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
145
345
0
23 Jul 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
182
3,480
0
20 Aug 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
204
1,275
0
04 Dec 2018
1