ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.11945
  4. Cited By
SOFT: Softmax-free Transformer with Linear Complexity

SOFT: Softmax-free Transformer with Linear Complexity

22 October 2021
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
ArXivPDFHTML

Papers citing "SOFT: Softmax-free Transformer with Linear Complexity"

21 / 21 papers shown
Title
Encryption-Friendly LLM Architecture
Encryption-Friendly LLM Architecture
Donghwan Rho
Taeseong Kim
Minje Park
Jung Woo Kim
Hyunsik Chae
Jung Hee Cheon
Ernest K. Ryu
52
1
0
24 Feb 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
75
0
0
21 Jan 2025
Streamlining Prediction in Bayesian Deep Learning
Streamlining Prediction in Bayesian Deep Learning
Rui Li
Marcus Klasson
Arno Solin
Martin Trapp
UQCV
BDL
91
1
0
27 Nov 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
A. Shafi
D. Panda
28
2
0
30 Aug 2024
Hierarchical Separable Video Transformer for Snapshot Compressive
  Imaging
Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang
Yulun Zhang
Lishun Wang
Xin Yuan
ViT
26
1
0
16 Jul 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
34
0
0
11 Jun 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
24
5
0
29 Mar 2024
CascadedGaze: Efficiency in Global Context Extraction for Image
  Restoration
CascadedGaze: Efficiency in Global Context Extraction for Image Restoration
Amirhosein Ghasemabadi
Muhammad Kamran Janjua
Mohammad Salameh
Chunhua Zhou
Fengyu Sun
Di Niu
22
11
0
26 Jan 2024
AMD-HookNet for Glacier Front Segmentation
AMD-HookNet for Glacier Front Segmentation
Fei Wu
Nora Gourmelon
T. Seehaus
Jianlin Zhang
M. Braun
Andreas K. Maier
Vincent Christlein
11
9
0
06 Feb 2023
Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
  Object Detection
Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection
Shan Zhang
Naila Murray
Lei Wang
Piotr Koniusz
ViT
27
16
0
30 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
39
9
0
14 Oct 2022
Dynamic Graph Message Passing Networks for Visual Recognition
Dynamic Graph Message Passing Networks for Visual Recognition
Li Zhang
Mohan Chen
Anurag Arnab
Xiangyang Xue
Philip H. S. Torr
GNN
18
1
0
20 Sep 2022
Integrative Feature and Cost Aggregation with Transformers for Dense
  Correspondence
Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence
Sunghwan Hong
Seokju Cho
Seung Wook Kim
Stephen Lin
3DV
42
4
0
19 Sep 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
14
24
0
17 Jun 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
12
178
0
06 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
25
149
0
27 Apr 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
14
21
0
28 Feb 2022
Is Attention Better Than Matrix Decomposition?
Is Attention Better Than Matrix Decomposition?
Zhengyang Geng
Meng-Hao Guo
Hongxu Chen
Xia Li
Ke Wei
Zhouchen Lin
54
134
0
09 Sep 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
260
178
0
17 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
1