Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.06950
Cited By
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
18 June 2018
Patrick H. Chen
Si Si
Yang Li
Ciprian Chelba
Cho-Jui Hsieh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking"
41 / 41 papers shown
Title
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Wonkyo Choe
Yangfeng Ji
F. Lin
77
0
0
14 Dec 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
31
1
0
23 Oct 2024
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models
Mingxue Xu
Sadia Sharmin
Danilo Mandic
35
2
0
03 Oct 2024
TropNNC: Structured Neural Network Compression Using Tropical Geometry
Konstantinos Fotopoulos
Petros Maragos
Panagiotis Misiakos
38
0
0
05 Sep 2024
Reweighted Solutions for Weighted Low Rank Approximation
David P. Woodruff
T. Yasuda
42
1
0
04 Jun 2024
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Yang Li
Changsheng Zhao
Hyungtak Lee
Ernie Chang
Yangyang Shi
Vikas Chandra
40
0
0
24 May 2024
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models
Chakshu Moar
Michael Pellauer
Hyoukjun Kwon
38
1
0
10 May 2024
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization
Rahul Chand
Yashoteja Prabhu
Pratyush Kumar
20
3
0
20 Dec 2023
Experimental Analysis of Large-scale Learnable Vector Storage Compression
Hailin Zhang
Penghao Zhao
Xupeng Miao
Yingxia Shao
Zirui Liu
Tong Yang
Bin Cui
34
11
0
27 Nov 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
21
3
0
16 Oct 2023
TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition
Mingxue Xu
Y. Xu
Danilo Mandic
10
2
0
02 Jul 2023
Low-Rank Prune-And-Factorize for Language Model Compression
Siyu Ren
Kenny Q. Zhu
14
9
0
25 Jun 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
33
19
0
05 Feb 2023
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression
Jiaqi Gu
Ben Keller
Jean Kossaifi
Anima Anandkumar
Brucek Khailany
David Z. Pan
ViT
35
8
0
30 Nov 2022
Numerical Optimizations for Weighted Low-rank Estimation on Language Model
Ting Hua
Yen-Chang Hsu
Felicity Wang
Qiang Lou
Yilin Shen
Hongxia Jin
27
13
0
02 Nov 2022
MorphTE: Injecting Morphology in Tensorized Embeddings
Guobing Gan
Peng Zhang
Sunzhu Li
Xiuqing Lu
Benyou Wang
36
5
0
27 Oct 2022
Language model compression with weighted low-rank factorization
Yen-Chang Hsu
Ting Hua
Sung-En Chang
Qiang Lou
Yilin Shen
Hongxia Jin
16
93
0
30 Jun 2022
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding
Pu Wang
Hugo Van hamme
VLM
29
4
0
28 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
21
74
0
20 Jun 2022
Rank Diminishing in Deep Neural Networks
Ruili Feng
Kecheng Zheng
Yukun Huang
Deli Zhao
Michael I. Jordan
Zhengjun Zha
34
28
0
13 Jun 2022
Efficient Mixed Dimension Embeddings for Matrix Factorization
D. Beloborodov
Andrei Zimovnov
Petr Molodyk
Dmitrii Kirillov
20
2
0
18 May 2022
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
81
83
0
08 Nov 2021
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition
Lucas Liebenwein
Alaa Maalouf
O. Gal
Dan Feldman
Daniela Rus
40
46
0
23 Jul 2021
From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables
Krtin Kumar
Peyman Passban
Mehdi Rezagholizadeh
Yiu Sing Lau
Qun Liu
11
2
0
18 Apr 2021
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
29
27
0
16 Sep 2020
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
Benlin Liu
Yongming Rao
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
6
37
0
27 Aug 2020
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
33
32
0
03 Aug 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
32
66
0
02 May 2020
A Generic Network Compression Framework for Sequential Recommender Systems
Yang Sun
Fajie Yuan
Ming Yang
Guoao Wei
Zhou Zhao
Duo Liu
26
54
0
21 Apr 2020
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yihuan Mao
Yujing Wang
Chufan Wu
Chen Zhang
Yang-Feng Wang
Yaming Yang
Quanlu Zhang
Yunhai Tong
Jing Bai
22
72
0
08 Apr 2020
One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
Matthew Shunshi Zhang
Bradly C. Stadie
8
32
0
30 Nov 2019
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
38
23
0
27 Nov 2019
Fully Quantized Transformer for Machine Translation
Gabriele Prato
Ella Charlaix
Mehdi Rezagholizadeh
MQ
13
68
0
17 Oct 2019
Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition
Vasileios Lioutas
Ahmad Rashid
Krtin Kumar
Md. Akmal Haidar
Mehdi Rezagholizadeh
29
9
0
02 Oct 2019
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
18
163
0
24 Jun 2019
Learning Low-Rank Approximation for CNNs
Dongsoo Lee
S. Kwon
Byeongwook Kim
Gu-Yeon Wei
32
19
0
24 May 2019
Network Pruning for Low-Rank Binary Indexing
Dongsoo Lee
S. Kwon
Byeongwook Kim
Parichay Kapoor
Gu-Yeon Wei
22
6
0
14 May 2019
Tensorized Embedding Layers for Efficient Model Compression
Oleksii Hrinchuk
Valentin Khrulkov
L. Mirvakhabova
Elena Orlova
Ivan Oseledets
32
73
0
30 Jan 2019
WEST: Word Encoded Sequence Transducers
Ehsan Variani
A. Suresh
M. Weintraub
17
9
0
20 Nov 2018
Universal Deep Neural Network Compression
Yoojin Choi
Mostafa El-Khamy
Jungwon Lee
MQ
86
85
0
07 Feb 2018
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
273
1,896
0
10 Jan 2017
1