Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse

22 May 2025

Josh Alman

Zhao Song

ArXiv (abs)PDF HTML

Papers citing "Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse"

18 / 18 papers shown

Fundamental Limits of Crystalline Equivariant Graph Neural Networks: A Circuit Complexity Perspective

150

07 Oct 2025

Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting

287

06 Oct 2025

Towards Sampling Data Structures for Tensor Products in Turnstile Streams

Zhao Song

Shenghao Xie

Samson Zhou

147

04 Oct 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

144

16 Aug 2025

Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria

177

09 Aug 2025

Subquadratic Algorithms and Hardness for Attention with Any Temperature

265

20 May 2025

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

Josh Alman

Zhao Song

349

17 May 2025

T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models

1.1K

08 May 2025

Always Skip Attention

Yiping Ji

Hemanth Saratchandran

Peyman Moghaddam

Simon Lucey

1.1K

04 May 2025

T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation

484

01 May 2025

Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models

503

05 Apr 2025

When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?International Conference on Artificial Intelligence and Statistics (AISTATS), 2025

245

24 Feb 2025

Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies

275

30 Dec 2024

RoPE Attention Can Be Trained in Almost Linear Time

354

23 Dec 2024

One Step Diffusion via Shortcut ModelsInternational Conference on Learning Representations (ICLR), 2024

650

175

16 Oct 2024

HSR-Enhanced Sparse Attention Acceleration

818

14 Oct 2024

Binary Hypothesis Testing for Softmax Models and Leverage Score Models

Yeqi Gao

Yuzhou Gu

Zhao Song

413

09 May 2024

The Expressibility of Polynomial based Attention Scheme

Zhao Song

Guangyi Xu

Junze Yin

320

30 Oct 2023