Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.16591
Cited By
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
22 May 2025
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
Junyuan Gao
Jia Yu
Rui Min
Yinfan Wang
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering"
10 / 10 papers shown
Title
CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation
Yexing Du
Kaiyuan Liu
Youcheng Pan
Zheng Chu
B. Yang
Xiaocheng Feng
Yang Xiang
Ming Liu
HILM
168
2
0
10 Aug 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
484
28
0
13 Mar 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Bin Wang
Lei Li
Shujian Huang
Fei Yuan
ELM
447
8
0
11 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
1.2K
5,274
0
22 Jan 2025
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
Neural Information Processing Systems (NeurIPS), 2024
Junho Myung
Nayeon Lee
Yi Zhou
Jiho Jin
Rifki Afina Putri
...
Seid Muhie Yimam
Mohammad Taher Pilehvar
N. Ousidhoum
Jose Camacho-Collados
Alice Oh
473
109
0
17 Jan 2025
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
Jie He
Nan Hu
Wanqiu Long
Jiaoyan Chen
Jeff Z. Pan
ELM
LRM
500
19
0
22 Dec 2024
Measuring short-form factuality in large language models
Jason W. Wei
Nguyen Karina
Hyung Won Chung
Yunxin Joy Jiao
Spencer Papay
Amelia Glaese
John Schulman
W. Fedus
ELM
KELM
HILM
244
202
0
07 Nov 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
566
2,655
0
25 Oct 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
421
22
0
25 Jun 2024
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Chen Cecilia Liu
Iryna Gurevych
Anna Korhonen
554
14
0
06 Jun 2024
1