ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16591
  4. Cited By
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

22 May 2025
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
Junyuan Gao
Jia Yu
Rui Min
Yinfan Wang
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
    ELM
ArXiv (abs)PDFHTML

Papers citing "Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering"

10 / 10 papers shown
Title
CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation
CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation
Yexing Du
Kaiyuan Liu
Youcheng Pan
Zheng Chu
B. Yang
Xiaocheng Feng
Yang Xiang
Ming Liu
HILM
168
2
0
10 Aug 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
484
28
0
13 Mar 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Bin Wang
Lei Li
Shujian Huang
Fei Yuan
ELM
447
8
0
11 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
1.2K
5,274
0
22 Jan 2025
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesNeural Information Processing Systems (NeurIPS), 2024
Junho Myung
Nayeon Lee
Yi Zhou
Jiho Jin
Rifki Afina Putri
...
Seid Muhie Yimam
Mohammad Taher Pilehvar
N. Ousidhoum
Jose Camacho-Collados
Alice Oh
473
109
0
17 Jan 2025
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
Jie He
Nan Hu
Wanqiu Long
Jiaoyan Chen
Jeff Z. Pan
ELMLRM
500
19
0
22 Dec 2024
Measuring short-form factuality in large language models
Measuring short-form factuality in large language models
Jason W. Wei
Nguyen Karina
Hyung Won Chung
Yunxin Joy Jiao
Spencer Papay
Amelia Glaese
John Schulman
W. Fedus
ELMKELMHILM
244
202
0
07 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
566
2,655
0
25 Oct 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
421
22
0
25 Jun 2024
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Chen Cecilia Liu
Iryna Gurevych
Anna Korhonen
554
14
0
06 Jun 2024
1