ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.10411
  4. Cited By
Attention Scheme Inspired Softmax Regression

Attention Scheme Inspired Softmax Regression

20 April 2023
Yichuan Deng
Zhihang Li
Zhao-quan Song
ArXivPDFHTML

Papers citing "Attention Scheme Inspired Softmax Regression"

41 / 41 papers shown
Title
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
56
3
0
03 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
36
3
0
24 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao-quan Song
Chiwun Yang
VGen
44
2
0
01 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
57
11
0
03 Jan 2025
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao-quan Song
33
0
0
09 May 2024
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel
  Acceleration Methods for Faster Convergence
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence
Yichuan Deng
Zhao-quan Song
Chiwun Yang
24
1
0
02 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
34
12
0
30 Jan 2024
One Pass Streaming Algorithm for Super Long Token Attention
  Approximation in Sublinear Space
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Raghav Addanki
Chenyang Li
Zhao-quan Song
Chiwun Yang
42
3
0
24 Nov 2023
Fast Heavy Inner Product Identification Between Weights and Inputs in
  Neural Network Training
Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training
Lianke Qin
Saayan Mitra
Zhao-quan Song
Yuanyuan Yang
Tianyi Zhou
27
0
0
19 Nov 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao-quan Song
Guangyi Xu
Junze Yin
27
5
0
30 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu
Jue Wang
Tri Dao
Tianyi Zhou
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
17
189
0
26 Oct 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster
  Convergence and Steeper Descent
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao-quan Song
Chiwun Yang
19
9
0
17 Oct 2023
Fine-tune Language Models to Approximate Unbiased In-context Learning
Fine-tune Language Models to Approximate Unbiased In-context Learning
Timothy Chu
Zhao-quan Song
Chiwun Yang
22
15
0
05 Oct 2023
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph
  Neural Network?
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Lianke Qin
Zhao-quan Song
Baocheng Sun
10
6
0
14 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao-quan Song
Weixin Wang
Junze Yin
18
25
0
14 Sep 2023
Online Adaptive Mahalanobis Distance Estimation
Online Adaptive Mahalanobis Distance Estimation
Lianke Qin
Aravind Reddy
Zhao-quan Song
36
1
0
02 Sep 2023
Solving Attention Kernel Regression Problem via Pre-conditioner
Solving Attention Kernel Regression Problem via Pre-conditioner
Zhao-quan Song
Junze Yin
Licheng Zhang
28
9
0
28 Aug 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao-quan Song
Chiwun Yang
28
29
0
23 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao-quan Song
Junze Yin
21
18
0
21 Aug 2023
Convergence of Two-Layer Regression with Nonlinear Units
Convergence of Two-Layer Regression with Nonlinear Units
Yichuan Deng
Zhao-quan Song
Shenghao Xie
15
7
0
16 Aug 2023
Zero-th Order Algorithm for Softmax Attention Optimization
Zero-th Order Algorithm for Softmax Attention Optimization
Yichuan Deng
Zhihang Li
Sridhar Mahadevan
Zhao-quan Song
30
13
0
17 Jul 2023
Fast Quantum Algorithm for Attention Computation
Fast Quantum Algorithm for Attention Computation
Yeqi Gao
Zhao-quan Song
Xin Yang
Ruizhe Zhang
LRM
23
19
0
16 Jul 2023
Efficient SGD Neural Network Training via Sublinear Activated Neuron
  Identification
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification
Lianke Qin
Zhao-quan Song
Yuanyuan Yang
20
9
0
13 Jul 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Tianyi Zhou
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
47
246
0
24 Jun 2023
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
  Language Understanding
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu
Tong Yu
Rui Wang
Zhao-quan Song
Ruiyi Zhang
Handong Zhao
Chaochao Lu
Shuai Li
Ricardo Henao
VLM
26
22
0
08 Jun 2023
Efficient Alternating Minimization with Applications to Weighted Low
  Rank Approximation
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Zhao-quan Song
Mingquan Ye
Junze Yin
Licheng Zhang
21
7
0
07 Jun 2023
Query Complexity of Active Learning for Function Family With Nearly
  Orthogonal Basis
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Xiangyi Chen
Zhao-quan Song
Baochen Sun
Junze Yin
Danyang Zhuo
31
3
0
06 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao-quan Song
Tianyi Zhou
14
23
0
04 Jun 2023
Federated Empirical Risk Minimization via Second-Order Method
Federated Empirical Risk Minimization via Second-Order Method
S. Bian
Zhao-quan Song
Junze Yin
FedML
25
8
0
27 May 2023
Fast Submodular Function Maximization
Fast Submodular Function Maximization
Lianke Qin
Zhao-quan Song
Yitan Wang
13
10
0
15 May 2023
Fast and Efficient Matching Algorithm with Deadline Instances
Fast and Efficient Matching Algorithm with Deadline Instances
Zhao-quan Song
Weixin Wang
Chenbo Yin
Junze Yin
8
7
0
15 May 2023
Efficient Asynchronize Stochastic Gradient Algorithm with Structured
  Data
Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data
Zhao-quan Song
Mingquan Ye
16
4
0
13 May 2023
Differentially Private Attention Computation
Differentially Private Attention Computation
Yeqi Gao
Zhao-quan Song
Xin Yang
42
19
0
08 May 2023
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
Yeqi Gao
Zhao-quan Song
Junze Yin
23
33
0
01 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax
  Regression
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Tianyi Zhou
28
36
0
26 Apr 2023
Solving Tensor Low Cycle Rank Approximation
Solving Tensor Low Cycle Rank Approximation
Yichuan Deng
Yeqi Gao
Zhao-quan Song
26
6
0
13 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
221
2,232
0
22 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
107
61
0
07 Mar 2023
Fast Attention Requires Bounded Entries
Fast Attention Requires Bounded Entries
Josh Alman
Zhao-quan Song
25
78
0
26 Feb 2023
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly
  Linear Time
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
Yuzhou Gu
Zhao-quan Song
Junze Yin
Licheng Zhang
16
26
0
21 Feb 2023
Federated Adversarial Learning: A Framework with Convergence Analysis
Federated Adversarial Learning: A Framework with Convergence Analysis
Xiaoxiao Li
Zhao-quan Song
Jiaming Yang
FedML
16
19
0
07 Aug 2022
1