ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.11773
  4. Cited By
Sinkformers: Transformers with Doubly Stochastic Attention

Sinkformers: Transformers with Doubly Stochastic Attention

22 October 2021
Michael E. Sander
Pierre Ablin
Mathieu Blondel
Gabriel Peyré
ArXivPDFHTML

Papers citing "Sinkformers: Transformers with Doubly Stochastic Attention"

50 / 60 papers shown
Title
Langevin Diffusion Approximation to Same Marginal Schrödinger Bridge
Langevin Diffusion Approximation to Same Marginal Schrödinger Bridge
Medha Agarwal
Zaïd Harchaoui
Garrett Mulcahy
Soumik Pal
OT
26
0
0
12 May 2025
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Heng-Sheng Chang
P. Mehta
34
0
0
01 May 2025
Quantum Doubly Stochastic Transformers
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Quantitative Clustering in Mean-Field Transformer Models
Quantitative Clustering in Mean-Field Transformer Models
Shi Chen
Zhengjiang Lin
Yury Polyanskiy
Philippe Rigollet
31
0
0
20 Apr 2025
Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity
Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity
Giacomo Greco
Luca Tamanini
29
1
0
15 Apr 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
33
0
0
03 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
39
2
0
02 Mar 2025
Exact Sequence Classification with Hardmax Transformers
Exact Sequence Classification with Hardmax Transformers
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
72
1
0
04 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
91
2
0
30 Jan 2025
A semiconcavity approach to stability of entropic plans and exponential
  convergence of Sinkhorn's algorithm
A semiconcavity approach to stability of entropic plans and exponential convergence of Sinkhorn's algorithm
Alberto Chiarini
Giovanni Conforti
Giacomo Greco
Luca Tamanini
82
4
0
12 Dec 2024
Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal
  Transport
Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport
Michael Wilson
Tom Needham
A. Srivastava
OT
29
0
0
15 Nov 2024
Clustering in Causal Attention Masking
Clustering in Causal Attention Masking
Nikita Karagodin
Yury Polyanskiy
Philippe Rigollet
60
5
0
07 Nov 2024
Provable optimal transport with transformers: The essence of depth and
  prompt engineering
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
29
0
0
25 Oct 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and
  Convolution Models Using a Novel Metric
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
18
0
0
23 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
17
1
0
14 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
39
0
0
03 Oct 2024
Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping
Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping
Jinghui Yuan
Chusheng Zeng
Fangyuan Xie
Zhe Cao
Mulin. Chen
Rong Wang
Feiping Nie
Yuan Yuan
20
3
0
06 Aug 2024
Transformers are Universal In-context Learners
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
37
6
0
02 Aug 2024
A Survey on LoRA of Large Language Models
A Survey on LoRA of Large Language Models
Yuren Mao
Yuhang Ge
Yijiang Fan
Wenyi Xu
Yu Mi
Zhonghao Hu
Yunjun Gao
ALM
52
24
0
08 Jul 2024
Attention Normalization Impacts Cardinality Generalization in Slot
  Attention
Attention Normalization Impacts Cardinality Generalization in Slot Attention
Markus Krimmel
Jan Achterhold
Joerg Stueckler
OCL
37
0
0
04 Jul 2024
Clustering in pure-attention hardmax transformers and its role in
  sentiment analysis
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
27
3
0
26 Jun 2024
The Balanced-Pairwise-Affinities Feature Transform
The Balanced-Pairwise-Affinities Feature Transform
Daniel Shalam
Simon Korman
33
0
0
25 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
21
13
0
19 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
23
3
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
43
4
0
19 Jun 2024
Iterated Schrödinger bridge approximation to Wasserstein Gradient
  Flows
Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows
Medha Agarwal
Zaïd Harchaoui
Garrett Mulcahy
Soumik Pal
30
0
0
16 Jun 2024
Synchronization on circles and spheres with nonlinear interactions
Synchronization on circles and spheres with nonlinear interactions
Christopher Criscitiello
Quentin Rebjock
Andrew D. McRae
Nicolas Boumal
23
4
0
28 May 2024
Deep Learning as Ricci Flow
Deep Learning as Ricci Flow
Anthony Baptista
Alessandro Barp
Tapabrata Chakraborti
Chris Harbron
Ben D. MacArthur
Christopher R. S. Banerji
AI4CE
41
0
0
22 Apr 2024
Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion
  Processes
Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion Processes
Haoming Yang
Ali Hasan
Yuting Ng
Vahid Tarokh
DiffM
37
4
0
15 Apr 2024
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic
  Segmentation
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
Kwanyoung Kim
Y. Oh
Jong Chul Ye
VLM
43
7
0
21 Mar 2024
The Impact of LoRA on the Emergence of Clusters in Transformers
The Impact of LoRA on the Emergence of Clusters in Transformers
Hugo Koubbi
Matthieu Boussard
Louis Hernandez
21
1
0
23 Feb 2024
Quantum Theory and Application of Contextual Optimal Transport
Quantum Theory and Application of Contextual Optimal Transport
Nicola Mariella
A. Akhriev
F. Tacchino
Christa Zoufal
J. C. González-Espitia
...
I. Tavernelli
Stefan Woerner
Marianna Rapsomaniki
Sergiy Zhuk
Jannis Born
OT
37
3
0
22 Feb 2024
How do Transformers perform In-Context Autoregressive Learning?
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
32
7
0
08 Feb 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
How Smooth Is Attention?
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
A mathematical perspective on Transformers
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
40
36
0
17 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
11
0
0
30 Nov 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in
  Discriminative Tasks
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
16
0
0
26 Oct 2023
Stochastic Latent Transformer: Efficient Modelling of Stochastically
  Forced Zonal Jets
Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets
Ira J. S. Shokar
R. Kerswell
Peter H. Haynes
10
3
0
25 Oct 2023
Adversarial Robustness in Graph Neural Networks: A Hamiltonian Approach
Adversarial Robustness in Graph Neural Networks: A Hamiltonian Approach
Kai Zhao
Qiyu Kang
Yang Song
Rui She
Sijie Wang
Wee Peng Tay
AAML
30
22
0
10 Oct 2023
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention
  for CTC-based ASR
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR
Ambar Pal
Jeremias Sulam
Yu Tsao
René Vidal
18
2
0
28 Sep 2023
Implicit regularization of deep residual networks towards neural ODEs
Implicit regularization of deep residual networks towards neural ODEs
P. Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
27
14
0
03 Sep 2023
Image Clustering via the Principle of Rate Reduction in the Age of
  Pretrained Models
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Tianzhe Chu
Shengbang Tong
Tianjiao Ding
Xili Dai
B. Haeffele
René Vidal
Y. Ma
SSL
VLM
15
13
0
08 Jun 2023
Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction
Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction
Ziqiao Meng
Peilin Zhao
Yang Yu
Irwin King
23
7
0
05 Jun 2023
Unbalanced Low-rank Optimal Transport Solvers
Unbalanced Low-rank Optimal Transport Solvers
M. Scetbon
Michal Klein
Giovanni Palla
Marco Cuturi
OT
32
4
0
31 May 2023
SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities
SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities
Hugues van Assel
Titouan Vayer
Rémi Flamary
Nicolas Courty
25
9
0
23 May 2023
The emergence of clusters in self-attention dynamics
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
22
46
0
09 May 2023
Scalable Optimal Transport Methods in Machine Learning: A Contemporary
  Survey
Scalable Optimal Transport Methods in Machine Learning: A Contemporary Survey
Abdelwahed Khamis
Russell Tsuchida
Mohamed Tarek
V. Rolland
Lars Petersson
OT
43
12
0
08 May 2023
Sampled Transformer for Point Sets
Sampled Transformer for Point Sets
Shidi Li
Christian J. Walder
Alexander Soen
Lexing Xie
Miaomiao Liu
3DPC
23
1
0
28 Feb 2023
Unlocking Slot Attention by Changing Optimal Transport Costs
Unlocking Slot Attention by Changing Optimal Transport Costs
Yan Zhang
David W. Zhang
Simon Lacoste-Julien
Gertjan J. Burghouts
Cees G. M. Snoek
OCL
29
11
0
30 Jan 2023
12
Next