Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07221
Cited By
LookupFFN: Making Transformers Compute-lite for CPU inference
12 March 2024
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LookupFFN: Making Transformers Compute-lite for CPU inference"
8 / 8 papers shown
Title
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
Yaxiong Wu
Sheng Liang
Chen Zhang
Y. Wang
Yuhang Zhang
Huifeng Guo
Ruiming Tang
Y. Liu
KELM
42
1
0
22 Apr 2025
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
Ning Ding
Yehui Tang
Haochen Qin
Zhenli Zhou
Chao Xu
Lin Li
Kai Han
Heng Liao
Yunhe Wang
62
0
0
20 Nov 2024
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
Jitai Hao
Weiwei Sun
Xin Xin
Qi Meng
Zhumin Chen
Pengjie Ren
Zhaochun Ren
MoE
42
2
0
07 Jun 2024
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu
Zhanpeng Zeng
Li Zhang
Vikas Singh
MQ
37
5
0
10 Mar 2024
HiRE: High Recall Approximate Top-
k
k
k
Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
51
4
0
14 Feb 2024
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
280
2,015
0
28 Jul 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,567
0
17 Apr 2017
1