Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2404.12387
Cited By
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
18 April 2024
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
Donovan Ong
Eric Z. Chen
Eugenie Lamprecht
Hai Pham
Isaac Ong
Kaloyan Aleksiev
Lei Li
Matthew Henderson
Max Bain
Mikel Artetxe
Nishant Relan
Piotr Padlewski
Qi Liu
Ren Chen
Samuel Phua
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRM
VLM
ReLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (40 upvotes)
Papers citing
"Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models"
44 / 44 papers shown
Title
Multimodal LLMs Do Not Compose Skills Optimally Across Modalities
Paula Ontalvilla
Aitor Ormazabal
Gorka Azkune
113
0
0
11 Nov 2025
Personalizing Retrieval using Joint Embeddings or "the Return of Fluffy"
Bruno Korbar
Andrew Zisserman
102
0
0
06 Oct 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
169
3
0
10 Aug 2025
MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs
Chenchen Zhao
Z. Shi
Xiangyu Wen
Chengjie Liu
Yi Liu
...
Yibo Lin
Jun Yang
Ning Xu
Xi Wang
Qiang Xu
107
3
0
20 Jul 2025
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures
Vijay Prakash Dwivedi
Charilaos I. Kanatsoulis
Shenyang Huang
Jure Leskovec
GNN
3DV
225
5
0
19 Jun 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
AuLLM
VGen
164
20
0
23 May 2025
Multimodal Conversation Structure Understanding
Kent K. Chang
Mackenzie Cramer
Anna Ho
Ti Ti Nguyen
Yilin Yuan
David Bamman
275
1
0
23 May 2025
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang
Monica Dou
Linkai Peng
Hongyi Pan
Ulas Bagci
Boqing Gong
VLM
261
1
0
12 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
404
6
0
29 Mar 2025
DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
432
2
0
25 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
246
1
0
22 Mar 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
259
4
0
21 Mar 2025
What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets
Marco Antonio Stranisci
Christian Hardmeier
358
2
0
17 Feb 2025
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
AAML
252
3
0
09 Feb 2025
GAMEBoT: Transparent Assessment of LLM Reasoning in Games
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenye Lin
Jonathan Roberts
Yunhan Yang
Samuel Albanie
Zongqing Lu
Kai Han
LRM
ELM
304
4
0
18 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
666
48
0
05 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zhiyong Yang
Xiangyu Yue
MLLM
AuLLM
VLM
247
25
0
03 Dec 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
303
3
0
29 Nov 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
International Conference on Learning Representations (ICLR), 2024
Jonathan Roberts
Kai Han
Samuel Albanie
LLMAG
1.0K
7
0
07 Nov 2024
LocateBench: Evaluating the Locating Ability of Vision Language Models
Ting-Rui Chiang
Joshua Robinson
Xinyan Velocity Yu
Dani Yogatama
VLM
ELM
212
0
0
17 Oct 2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng
Yun Xing
Zesen Cheng
Yang Zhou
Hang Zhang
Xin Li
Deli Zhao
Shijian Lu
Chunyan Miao
Lidong Bing
296
25
0
16 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Dianbo Sui
Qiang Liu
Xipeng Qiu
181
4
0
16 Oct 2024
OmniBench: Towards The Future of Universal Omni-Language Models
Y. Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
559
52
0
23 Sep 2024
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts
Kai Han
Samuel Albanie
202
3
0
21 Aug 2024
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
Neural Information Processing Systems (NeurIPS), 2024
Pierre Colombo
T. Pires
Malik Boudiaf
Rui Melo
Dominic Culver
Sofia Morgado
Etienne Malaboeuf
Gabriel Hautreux
Johanne Charpentier
Michael Desa
ELM
AILaw
ALM
226
34
0
28 Jul 2024
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
244
6
0
17 Jul 2024
The 2024 Foundation Model Transparency Index
Rishi Bommasani
Kevin Klyman
Sayash Kapoor
Shayne Longpre
Betty Xiong
Nestor Maslej
Abigail Z. Jacobs
ELM
271
18
0
17 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
440
191
0
17 Jul 2024
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
Jupinder Parmar
Sanjev Satheesh
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
252
51
0
09 Jul 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Pritam Gundecha
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
262
11
0
08 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
166
15
0
28 Jun 2024
ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Jr-Jen Chen
Yu-Chien Liao
Hsi-Che Lin
Yu-Chu Yu
Yen-Chun Chen
Yu-Chiang Frank Wang
320
38
0
27 Jun 2024
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
283
336
0
24 Jun 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Yuxuan Qiao
Haodong Duan
Xinyu Fang
Junming Yang
Lin Chen
Songyang Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LRM
206
28
0
20 Jun 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Xinyu Fang
Kangrui Mao
Haodong Duan
Xiangyu Zhao
Yining Li
Dahua Lin
Kai Chen
VLM
185
143
0
20 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
International Conference on Learning Representations (ICLR), 2024
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
232
10
0
10 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
567
811
0
31 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
388
92
0
28 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
295
0
0
13 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLM
MLLM
358
178
0
02 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
478
967
0
25 Apr 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
315
95
0
01 Mar 2024
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Transactions of the Association for Computational Linguistics (TACL), 2020
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
536
684
0
10 Mar 2020
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
560
626
0
06 Nov 2019
1