ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.19427
  4. Cited By
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

29 February 2024
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
Albert Gu
Ruba Haroun
Leonard Berrada
Yutian Chen
S. Srinivasan
Guillaume Desjardins
Arnaud Doucet
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
    Mamba
ArXivPDFHTML

Papers citing "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

9 / 9 papers shown
Title
Reasoning Capabilities and Invariability of Large Language Models
Reasoning Capabilities and Invariability of Large Language Models
Alessandro Raganato
Rafael Peñaloza
Marco Viviani
G. Pasi
ReLM
LRM
60
46
0
01 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
X. Li
Guangliang Cheng
Mamba
72
0
0
01 May 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
62
3
0
19 Feb 2025
Backpropagation through space, time, and the brain
Backpropagation through space, time, and the brain
B. Ellenberger
Paul Haider
Jakob Jordan
Kevin Max
Ismael Jaras
Laura Kriener
Federico Benitez
Mihai A. Petrovici
36
8
0
25 Mar 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
77
29
0
01 Feb 2024
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
73
153
0
11 Mar 2023
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
217
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
240
1,436
0
17 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1