ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00118
  4. Cited By
Gemma 2: Improving Open Language Models at a Practical Size
v1v2 (latest)

Gemma 2: Improving Open Language Models at a Practical Size

31 July 2024
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
Surya Bhupatiraju
Léonard Hussenot
Thomas Mesnard
Bobak Shahriari
Alexandre Ramé
Johan Ferret
Peter J. Liu
P. Tafti
Abe Friesen
Michelle Casbon
Sabela Ramos
Ravin Kumar
Charline Le Lan
Sammy Jerome
Anton Tsitsulin
Nino Vieillard
Piotr Stańczyk
Sertan Girgin
Nikola Momchev
Matt Hoffman
S. Thakoor
Jean-Bastien Grill
Behnam Neyshabur
Olivier Bachem
Alanna Walton
Aliaksei Severyn
Alicia Parrish
Aliya Ahmad
Allen Hutchison
Alvin Abdagic
Amanda Carl
Amy Shen
Andy Brock
Andy Coenen
Anthony Laforge
Antonia Paterson
Ben Bastian
Bilal Piot
Boxi Wu
Brandon Royal
Charlie Chen
Chintu Kumar
Chris Perry
Christoper A. Welty
Christopher A. Choquette-Choo
Danila Sinopalnikov
David Weinberger
Dimple Vijaykumar
Dominika Rogoziñska
D. Herbison
Elisa Bandy
Emma Wang
Eric Noland
Erica Moreira
Evan Senter
Evgenii Eltyshev
Francesco Visin
Gabriel Rasskin
Gary Wei
Glenn Cameron
Gus Martins
Hadi Hashemi
Hanna Klimczak-Pluciñska
Harleen Batra
H. Dhand
Ivan Nardini
Jacinda Mein
Jack Zhou
James Svensson
Jeff Stanway
Jetha Chan
Jin Zhou
Joana Carrasqueira
Joana Iljazi
Jocelyn Becker
Joe Fernandez
Joost R. van Amersfoort
Josh Gordon
Josh Lipschultz
Joshua Newlan
Junsong Ji
Kareem Mohamed
Kartikeya Badola
Kat Black
Katie Millican
Keelin McDonell
Kelvin Nguyen
Kiranbir Sodhia
Kish Greene
Lars Lowe Sjoesund
Lauren Usui
Laurent Sifre
L. Heuermann
Leticia Lago
Lilly McNealus
Livio Baldini Soares
Logan Kilpatrick
Lucas Dixon
Luciano Martins
Machel Reid
Manvinder Singh
Mark Iverson
Martin Gorner
Mat Velloso
Mateo Wirth
Matt Davidow
Matt Miller
Matthew Rahtz
Matthew Watson
Meg Risdal
Mehran Kazemi
Michael Moynihan
Ming Zhang
Minsuk Kahng
Minwoo Park
Mofi Rahman
Mohit Khatwani
Natalie Dao
Nenshad Bardoliwalla
Nesh Devanathan
Neta Dumai
Nilay Chauhan
O. Wahltinez
Pankil Botarda
Parker Barnes
P. Barham
Paul Michel
Pengchong Jin
Petko Georgiev
Phil Culliton
Pradeep Kuppala
Ramona Comanescu
Ramona Merhej
Reena Jana
R. Rokni
Rishabh Agarwal
Ryan Mullins
Samaneh Saadat
Sara Mc Carthy
Sarah Perrin
Sébastien Arnold
Sebastian Krause
Shengyang Dai
S. Garg
Shruti Sheth
S. Ronstrom
Susan Chan
Timothy Jordan
Ting-To Yu
Tom Eccles
Tom Hennigan
Tomás Kociský
Tulsee Doshi
Vihan Jain
Vikas Yadav
Vilobh Meshram
Vishal Dharmadhikari
Warren Barkley
Wei Wei
Wenming Ye
Woohyun Han
Woosuk Kwon
Xiang Xu
Zhe Shen
Zhitao Gong
Zichuan Wei
Victor Cotruta
Phoebe Kirk
Anand Rao
Minh Giang
Ludovic Peran
T. Warkentin
Eli Collins
Joelle Barral
Zoubin Ghahramani
R. Hadsell
D. Sculley
Jeanine Banks
Anca Dragan
Slav Petrov
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clement Farabet
Elena Buchatskaya
Sebastian Borgeaud
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
    VLMMoEOSLM
ArXiv (abs)PDFHTMLHuggingFace (79 upvotes)

Papers citing "Gemma 2: Improving Open Language Models at a Practical Size"

50 / 657 papers shown
Title
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoEAI4CE
412
7
0
13 Feb 2025
Enhancing LLM Character-Level Manipulation via Divide and Conquer
Enhancing LLM Character-Level Manipulation via Divide and Conquer
Zhen Xiong
Yujun Cai
Bryan Hooi
Nanyun Peng
Kai-Wei Chang
Zhecheng Li
326
0
0
12 Feb 2025
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang
Zimeng Huang
Yijia Fan
Ningyuan Liu
Mingyan Li
Zhuojie Yang
Jiawei Yao
Jian Wang
Keze Wang
177
10
0
11 Feb 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Bin Wang
Lei Li
Shujian Huang
Fei Yuan
ELM
411
8
0
11 Feb 2025
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast AsiaNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Chaoqun Liu
Wenxuan Zhang
Jiahao Ying
Mahani Aljunied
Anh Tuan Luu
Lidong Bing
ELM
466
9
0
10 Feb 2025
Task-driven Layerwise Additive Activation Intervention
Task-driven Layerwise Additive Activation InterventionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Hieu Trung Nguyen
Bao Nguyen
Binh Nguyen
V. Nguyen
KELM
214
3
0
10 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAMLLRM
544
49
0
10 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
252
5
0
08 Feb 2025
Incongruence Identification in Eyewitness Testimony
Incongruence Identification in Eyewitness Testimony
Akshara Nair
Zeba Afroz
Md Shad Akhtar
288
1
0
08 Feb 2025
Can Large Language Models Understand Intermediate Representations in Compilers?
Can Large Language Models Understand Intermediate Representations in Compilers?
Hailong Jiang
Jianfeng Zhu
Yao Wan
B. Fang
Hongyu Zhang
Hailong Jiang
Qiang Guan
296
1
0
07 Feb 2025
Safety Reasoning with Guidelines
Safety Reasoning with Guidelines
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
353
4
0
06 Feb 2025
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and BeyondNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Mardhiyah Sanni
Tassallah Abdullahi
Devendra D. Kayande
Emmanuel Ayodele
Naome A. Etori
...
Chibuzor Okocha
L. Ismaila
Folafunmi Omofoye
Boluwatife A. Adewale
Tobi Olatunji
335
6
0
06 Feb 2025
COSMosFL: Ensemble of Small Language Models for Fault Localisation
COSMosFL: Ensemble of Small Language Models for Fault Localisation
Hyunjoon Cho
Sungmin Kang
Gabin An
S. Yoo
221
2
0
05 Feb 2025
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge RefinementIEEE International Conference on Robotics and Automation (ICRA), 2025
Shivam Singh
Karthik Swaminathan
Nabanita Dash
Ramandeep Singh
Snehasis Banerjee
Mohan Sridharan
Madhava Krishna
LLMAGLM&Ro
322
4
0
04 Feb 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
Zongxi Li
Jian Wang
Haoran Xie
S. J. Qin
359
3
0
03 Feb 2025
Scaling Embedding Layers in Language Models
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
416
6
0
03 Feb 2025
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
Chengfeng Zhou
Ji Wang
Juanjuan Qin
Yining Wang
Ling Sun
Weiwei Dai
LM&MAELM
387
1
0
03 Feb 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
Ehsaneddin Asgari
Yassine El Kheir
Mohammad Ali Sadraei Javaheri
248
11
0
02 Feb 2025
Ensembles of Low-Rank Expert Adapters
Ensembles of Low-Rank Expert AdaptersInternational Conference on Learning Representations (ICLR), 2025
Yinghao Li
Vianne Gao
Chao Zhang
MohamadAli Torkamani
371
5
0
31 Jan 2025
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Yun Wang
Tiansheng Huang
Li Shen
Huanjin Yao
Haotian Luo
Rui Liu
Naiqiang Tan
Jiaxing Huang
Dacheng Tao
AAMLMoMeCLL
355
9
0
30 Jan 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
...
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
Hui Xiong
Bryan Hooi
AI4TSLRM
502
52
0
30 Jan 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
AAML
319
10
0
29 Jan 2025
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Minghan Li
Eric Gaussier
Guodong Zhou
RALM
226
0
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated FeedbackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
324
15
0
28 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMsInternational Conference on Learning Representations (ICLR), 2024
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
331
12
0
28 Jan 2025
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized DomainsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Ran Xu
Hui Liu
Jiapeng Liu
Zhenwei Dai
Yaochen Xie
...
Chen Luo
Yang Li
Joyce C. Ho
Carl Yang
Qi He
RALM
426
21
0
28 Jan 2025
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Satyapriya Krishna
Kalpesh Krishna
Anhad Mohananey
Steven Schwarcz
Adam Stambler
Shyam Upadhyay
Manaal Faruqui
ReLM3DVLRMRALM
258
80
0
28 Jan 2025
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Makoto Shing
Yuichi Inoue
Han Bao
Sho Yokoi
Takuya Akiba
VLM
467
11
0
28 Jan 2025
FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting
FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting
Zhiyuan Fu
Junfan Chen
Lan Zhang
Ting Yang
Jun Niu
...
Ruidong Li
Peng Liu
Yuqing Zhang
Fannv He
Yuqing Zhang
DeLMO
285
0
0
27 Jan 2025
The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs
The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs
Antony Bartlett
Cynthia C. S. Liem
Annibale Panichella
82
0
0
27 Jan 2025
HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor
HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor
Zihui Wu
Haichang Gao
Jiacheng Luo
Zhaoxiang Liu
376
1
0
23 Jan 2025
Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents
Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM AgentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Shrinidhi Kumbhar
Venkatesh Mishra
Kevin Coutinho
Divij Handa
Ashif Iquebal
Chitta Baral
285
16
0
23 Jan 2025
Revealing emergent human-like conceptual representations from language prediction
Revealing emergent human-like conceptual representations from language predictionProceedings of the National Academy of Sciences of the United States of America (PNAS), 2025
Ningyu Xu
Tao Gui
Chao Du
Qiang Luo
Jiaqi Leng
Qi Zhang
Menghan Zhang
462
1
0
21 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsNeural Information Processing Systems (NeurIPS), 2024
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhikai Wu
AuLLM
328
48
0
17 Jan 2025
AudioBERT: Audio Knowledge Augmented Language Model
AudioBERT: Audio Knowledge Augmented Language ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLMRALMVLM
195
1
0
17 Jan 2025
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM GuardrailsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Shaona Ghosh
Prasoon Varshney
Makesh Narsimhan Sreedhar
Aishwarya Padmakumar
Traian Rebedea
Jibin Rajan Varghese
Christopher Parisien
297
54
0
15 Jan 2025
Language Fusion for Parameter-Efficient Cross-lingual Transfer
Language Fusion for Parameter-Efficient Cross-lingual TransferAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Philipp Borchert
Ivan Vulić
Marie-Francine Moens
Jochen De Weerdt
310
2
0
12 Jan 2025
CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback
CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive FeedbackBigData Congress [Services Society] (BSS), 2024
En-Qi Tseng
Pei-Cing Huang
Chan Hsu
Peng-Yi Wu
Chan-Tung Ku
Yihuang Kang
201
5
0
10 Jan 2025
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Lihe Yang
Zhen Zhao
Hengshuang Zhao
VLM
133
6
0
10 Jan 2025
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah Khayatan
Mustafa Shukor
Jayneel Parekh
Arnaud Dapogny
Matthieu Cord
LLMSV
371
6
0
06 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Cunchun Li
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
342
6
0
03 Jan 2025
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Mahir Labib Dihan
Md Tanvir Hassan
Md Tanvir Parvez
Md Hasebul Hasan
Md Almash Alam
Muhammad Aamir Cheema
Mohammed Eunus Ali
Md. Rizwan Parvez
ELMLRM
421
12
0
31 Dec 2024
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in TransformersProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
276
19
0
31 Dec 2024
Large-scale moral machine experiment on large language models
Large-scale moral machine experiment on large language modelsPLoS ONE (PLoS ONE), 2024
Muhammad Shahrul Zaim bin Ahmad
Kazuhiro Takemoto
ELMAILaw
324
6
1
31 Dec 2024
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
1.0K
0
0
30 Dec 2024
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
Xin Huang
Tarun K. Vangani
Minh Duc Pham
Xunlong Zou
Bin Wang
Zhengyuan Liu
Ai Ti Aw
LRM
320
2
0
21 Dec 2024
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase
  Pretraining
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Steven Feng
Shrimai Prabhumoye
John Kamalu
Jane Polak Scowcroft
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
263
14
0
18 Dec 2024
Pipeline Analysis for Developing Instruct LLMs in Low-Resource
  Languages: A Case Study on Basque
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque
Ander Corral
Ixak Sarasua
Xabier Saralegi
155
2
0
18 Dec 2024
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Extending LLMs to New Languages: A Case Study of Llama and Persian AdaptationInternational Conference on Computational Linguistics (COLING), 2024
Samin Mahdizadeh Sani
Pouya Sadeghi
Thuy-Trang Vu
Yadollah Yaghoobzadeh
Gholamreza Haffari
335
5
0
17 Dec 2024
Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models
Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language ModelsInternational Journal of Machine Learning and Cybernetics (IJMLC), 2024
Chengyan Wu
Bolei Ma
Zheyu Zhang
Ningyuan Deng
Yanqing He
Yun Xue
LRM
328
8
0
17 Dec 2024
Previous
123...1011121314
Next