Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.18508
Cited By
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
28 February 2024
Mahdi Karami
Ali Ghodsi
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling"
12 / 12 papers shown
Title
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
Ali Behrouz
Ali Parviz
Mahdi Karami
Clayton Sanford
Bryan Perozzi
Vahab Mirrokni
79
2
0
23 Nov 2024
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Costin-Andrei Oncescu
Sanket Purandare
Stratos Idreos
Sham Kakade
VLM
AI4TS
3DV
16
0
0
16 Oct 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
37
32
0
29 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
53
116
0
29 Feb 2024
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora
Sabri Eyuboglu
Michael Zhang
Aman Timalsina
Silas Alberti
Dylan Zinsley
James Zou
Atri Rudra
Christopher Ré
39
18
0
28 Feb 2024
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
88
258
0
11 Mar 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
M. Bronstein
Joan Bruna
Taco S. Cohen
Petar Velivcković
GNN
166
1,095
0
27 Apr 2021
Fourier Neural Operator for Parametric Partial Differential Equations
Zong-Yi Li
Nikola B. Kovachki
Kamyar Azizzadenesheli
Burigede Liu
K. Bhattacharya
Andrew M. Stuart
Anima Anandkumar
AI4CE
203
2,254
0
18 Oct 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
228
578
0
12 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1