ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19531
  4. Cited By
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions

Minimalist Softmax Attention Provably Learns Constrained Boolean Functions

26 May 2025
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
    MLT
ArXiv (abs)PDFHTML

Papers citing "Minimalist Softmax Attention Provably Learns Constrained Boolean Functions"

32 / 32 papers shown
Title
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Shreya Gupta
Boyang Huang
Barna Saha
Yinzhan Xu
Christopher Ye
48
2
0
20 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
101
16
0
17 May 2025
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao Song
Jiahao Zhang
Jiale Zhao
EGVMVGenPINN
185
5
0
01 May 2025
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent
Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent
Bo Chen
Zhenmei Shi
Zhao Song
Jiahao Zhang
NAILRMAI4CE
96
3
0
07 Apr 2025
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
Xuyang Guo
Zekai Huang
Jiayan Huo
Yingyu Liang
Zhenmei Shi
Zhao Song
Jiahao Zhang
ALMVGen
196
6
0
05 Apr 2025
On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective
On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Wei Wang
Jiahao Zhang
GNN
100
15
0
11 Jan 2025
Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based
  Tensor Attention Transformers
Theoretical Constraints on the Expressive Power of RoPE\mathsf{RoPE}RoPE-based Tensor Attention Transformers
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Mingda Wan
329
9
0
23 Dec 2024
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large
  Language Models
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Bofei Gao
Feifan Song
Zhiyong Yang
Zefan Cai
Yibo Miao
...
Lei Sha
Yichang Zhang
Xuancheng Ren
Tianyu Liu
Baobao Chang
ELMLRM
113
66
0
10 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoELRM
168
7
0
07 Oct 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
60
0
0
09 May 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Vatsal Sharan
91
2
0
11 Mar 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial
  Problems
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRMAI4CE
93
133
0
20 Feb 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
92
29
0
15 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field
  Dynamics on the Attention Landscape
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
113
24
0
02 Feb 2024
In-Context Convergence of Transformers
In-Context Convergence of Transformers
Yu Huang
Yuan Cheng
Yingbin Liang
MLT
109
73
0
08 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
124
37
0
06 Oct 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
120
18
0
26 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
454
12,106
0
18 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner
  with One Layer of Linear Self-Attention
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
80
102
0
07 Jul 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
100
195
0
26 Jun 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
93
201
0
16 Jun 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.6K
13,520
0
27 Feb 2023
Fast Attention Requires Bounded Entries
Fast Attention Requires Bounded Entries
Josh Alman
Zhao Song
87
86
0
26 Feb 2023
Large Language Models Encode Clinical Knowledge
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MAELMAI4MH
182
2,395
0
26 Dec 2022
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for
  Text-to-Video Generation
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
145
751
0
22 Dec 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRLLRM
159
178
0
19 Oct 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
484
6,096
0
23 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLMBDLLRMAI4CE
573
3,749
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
878
9,752
0
28 Jan 2022
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
110
107
0
30 Jun 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
941
42,622
0
28 May 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,454
0
11 Oct 2018
1