ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07463
  4. Cited By
MEGAVERSE: Benchmarking Large Language Models Across Languages,
  Modalities, Models and Tasks

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

13 November 2023
Sanchit Ahuja
Divyanshu Aggarwal
Varun Gumma
Ishaan Watts
Ashutosh Sathe
Millicent Ochieng
Rishav Hada
Prachi Jain
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
    ELM
ArXivPDFHTML

Papers citing "MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks"

37 / 37 papers shown
Title
Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?
Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?
Yewei Song
Lujun Li
Cedric Lothritz
Saad Ezzini
Lama Sleem
Niccolo Gentile
Radu State
Tegawende F. Bissyande
Jacques Klein
44
1
0
31 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRM
ELM
59
0
0
14 Mar 2025
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
Greta Warren
Irina Shklovski
Isabelle Augenstein
OffRL
67
4
0
13 Feb 2025
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
Sankalp KJ
Ashutosh Kumar
Laxmaan Balaji
Nikunj Kotecha
Vinija Jain
Aman Chadha
S. Bhaduri
ELM
61
1
0
27 Jan 2025
One world, one opinion? The superstar effect in LLM responses
One world, one opinion? The superstar effect in LLM responses
Sofie Goethals
L. Rhue
78
0
0
13 Dec 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional
  Knowledge
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
102
5
0
29 Nov 2024
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for
  reference-free open-ended text
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text
Reshmi Ghosh
Tianyi Yao
Lizzy Chen
Sadid Hasan
Tianwei Chen
Dario Bernal
Huitian Jiao
H M Sajjad Hossain
ELM
72
0
0
25 Nov 2024
Improving Bilingual Capabilities of Language Models to Support Diverse
  Linguistic Practices in Education
Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education
Anand Syamkumar
Nora Tseng
Kaycie Barron
Shanglin Yang
Shamya Karumbaiah
Rheeya Uppal
Junjie Hu
29
1
0
06 Nov 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
34
5
0
31 Oct 2024
Contamination Report for Multilingual Benchmarks
Contamination Report for Multilingual Benchmarks
Sanchit Ahuja
Varun Gumma
Sunayana Sitaram
16
0
0
21 Oct 2024
Towards Robust Knowledge Representations in Multilingual LLMs for
  Equivalence and Inheritance based Consistent Reasoning
Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning
Gaurav Arora
Srujana Merugu
Shreya Jain
Vaibhav Saxena
LRM
24
0
0
18 Oct 2024
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World
  Multilingual Settings
HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings
Varun Gumma
Anandhita Raghunath
Mohit Jain
Sunayana Sitaram
LM&MA
32
1
0
17 Oct 2024
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Dilip Venkatesh
Raj Dabre
Anoop Kunchukuttan
Mitesh M. Khapra
ELM
35
1
0
17 Oct 2024
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual
  Alignment
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
Amir Hossein Kargaran
Ali Modarressi
Nafiseh Nikeghbal
Jana Diesner
François Yvon
Hinrich Schütze
ELM
44
3
0
08 Oct 2024
Do Large Language Models Speak All Languages Equally? A Comparative
  Study in Low-Resource Settings
Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings
Md. Arid Hasan
Prerona Tarannum
Krishno Dey
Imran Razzak
Usman Naseem
26
4
0
05 Aug 2024
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Samee Arif
Abdul Hameed Azeemi
Agha Ali Raza
Awais Athar
ALM
LM&MA
ELM
33
4
0
05 Jul 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal
  Models Across Multilingual and Multicultural Vision-Language Tasks
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider
Sunayana Sitaram
VLM
37
7
0
04 Jul 2024
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs
Abhinav Rao
Monojit Choudhury
Somak Aditya
14
0
0
18 Jun 2024
Decoding the Diversity: A Review of the Indic AI Research Landscape
Decoding the Diversity: A Review of the Indic AI Research Landscape
Sankalp KJ
Vinija Jain
S. Bhaduri
Tamoghna Roy
Aman Chadha
47
5
0
13 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
50
6
0
05 Jun 2024
Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced,
  Low-Resource Real-World Scenarios
Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios
Millicent Ochieng
Varun Gumma
Sunayana Sitaram
Jindong Wang
Vishrav Chaudhary
K. Ronen
Kalika Bali
Jacki OÑeill
34
4
0
01 Jun 2024
Comparing LLM prompting with Cross-lingual transfer performance on
  Indigenous and Low-resource Brazilian Languages
Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
David Ifeoluwa Adelani
A. S. Dougruoz
André Coneglian
Atul Kr. Ojha
26
2
0
28 Apr 2024
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Adrian de Wynter
Ishaan Watts
Nektar Ege Altıntoprak
Tua Wongsangaroonsri
Minghui Zhang
...
Anna Vickers
Stéphanie Visser
Herdyan Widarmanto
A. Zaikin
Si-Qing Chen
LM&MA
44
16
0
22 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Yinghui Li
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
47
35
0
07 Apr 2024
TRUCE: Private Benchmarking to Prevent Contamination and Improve
  Comparative Evaluation of LLMs
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Tanmay Rajore
Nishanth Chandran
Sunayana Sitaram
Divya Gupta
Rahul Sharma
Kashish Mittal
Manohar Swaminathan
39
13
0
01 Mar 2024
MaLA-500: Massive Language Adaptation of Large Language Models
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
23
15
0
24 Jan 2024
Towards Conversational Diagnostic AI
Towards Conversational Diagnostic AI
Tao Tu
Anil Palepu
M. Schaekermann
Khaled Saab
Jan Freyberg
...
Katherine Chou
Greg S. Corrado
Yossi Matias
Alan Karthikesalingam
Vivek Natarajan
AI4MH
LM&MA
20
87
0
11 Jan 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
Beyond English-Centric Bitexts for Better Multilingual Language
  Representation Learning
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Li Dong
Furu Wei
Vishrav Chaudhary
Xia Song
54
23
0
26 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
165
320
0
06 Oct 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
67
71
0
25 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
92
167
0
28 Sep 2021
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
  Semantic Annotations for Task-Oriented Dialogue Modeling
RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling
Jun Quan
Shian Zhang
Qian Cao
Zi-pu Li
Deyi Xiong
33
51
0
17 Oct 2020
MLQA: Evaluating Cross-lingual Extractive Question Answering
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
239
489
0
16 Oct 2019
1