ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.17752
  4. Cited By
Can multiple-choice questions really be useful in detecting the
  abilities of LLMs?

Can multiple-choice questions really be useful in detecting the abilities of LLMs?

26 March 2024
Wangyue Li
Liangzhi Li
Tong Xiang
Xiao Liu
Wei Deng
Noa Garcia
    ELM
ArXivPDFHTML

Papers citing "Can multiple-choice questions really be useful in detecting the abilities of LLMs?"

24 / 24 papers shown
Title
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
Hao Li
Liuzhenghao Lv
He Cao
Zijing Liu
Zhiyuan Yan
Yu Wang
Yonghong Tian
Y. Li
Li Yuan
27
0
0
10 Apr 2025
Patience is all you need! An agentic system for performing scientific literature review
Patience is all you need! An agentic system for performing scientific literature review
David Brett
Anniek Myatt
25
0
0
28 Mar 2025
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education
Shrutika Singh
Anton Alyakin
Daniel Alber
Jaden Stryker
Ai Phuong S Tong
...
Mathew de la Paz
Miguel Hernandez-Rovira
Ki Yun Park
Eric Leuthardt
E. Oermann
AI4MH
AI4Ed
ELM
58
0
0
13 Mar 2025
Large Language Models Often Say One Thing and Do Another
Ruoxi Xu
Hongyu Lin
Xianpei Han
Jia Zheng
Weixiang Zhou
Le Sun
Yingfei Sun
42
1
0
10 Mar 2025
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
Jiho Jin
Woosung Kang
Junho Myung
Alice H. Oh
41
0
0
10 Mar 2025
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Yizhe Zhang
Richard He Bai
Zijin Gu
Ruixiang Zhang
Jiatao Gu
Emmanuel Abbe
Samy Bengio
Navdeep Jaitly
LRM
BDL
53
1
0
25 Feb 2025
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
Yong Cao
Haijiang Liu
Arnav Arora
Isabelle Augenstein
Paul Röttger
Daniel Hershcovich
49
1
0
20 Feb 2025
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Guangxiang Zhao
Saier Hu
Xiaoqi Jian
Jinzhu Wu
Yuhan Wu
Change Jia
Lin Sun
Xiangzheng Zhang
81
0
0
18 Feb 2025
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart
Pablo A. Martin-Torres
Daniel Hinjos
Pablo Bernabeu Perez
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Sergio Álvarez Napagao
Dario Garcia-Gasulla
LM&MA
ELM
100
1
0
10 Feb 2025
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
Chengfeng Zhou
Ji Wang
Juanjuan Qin
Yining Wang
Ling Sun
Weiwei Dai
LM&MA
ELM
86
0
0
03 Feb 2025
Addressing Blind Guessing: Calibration of Selection Bias in
  Multiple-Choice Question Answering by Video Language Models
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
Olga Loginova
Oleksandr Bezrukov
Alexey Kravets
18
0
0
18 Oct 2024
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors
  in Pretrained Language Models
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han
Peiyang Song
Haofei Yu
Jiaxuan You
ReLM
LRM
21
1
0
23 Sep 2024
Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for
  Filipino
Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino
Jann Railey Montalan
Jian Gang Ngui
Wei Qi Leong
Yosephine Susanto
Hamsawardhini Rengarajan
William-Chandra Tjhi
Alham Fikri Aji
33
3
0
20 Sep 2024
Toward the Evaluation of Large Language Models Considering Score
  Variance across Instruction Templates
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai
Adam Nohejl
Jiangnan Hang
Hidetaka Kamigaito
Taro Watanabe
ELM
36
2
0
22 Aug 2024
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
Jie M. Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
LLMAG
PILM
40
12
0
17 Jul 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Leonidas Zotos
H. Rijn
Malvina Nissim
ELM
34
2
0
07 Jul 2024
Social Bias Evaluation for Large Language Models Requires Prompt
  Variations
Social Bias Evaluation for Large Language Models Requires Prompt Variations
Rem Hida
Masahiro Kaneko
Naoaki Okazaki
38
13
0
03 Jul 2024
Autonomous Prompt Engineering in Large Language Models
Autonomous Prompt Engineering in Large Language Models
Daan Kepel
Konstantina Valogianni
LLMAG
35
6
0
25 Jun 2024
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
35
13
0
12 Jun 2024
PertEval: Unveiling Real Knowledge Capacity of LLMs with
  Knowledge-Invariant Perturbations
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
Jiatong Li
Renjun Hu
Kunzhe Huang
Zhuang Yan
Qi Liu
Mengxiao Zhu
Xing Shi
Wei Lin
KELM
36
4
0
30 May 2024
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
138
184
0
22 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
242
1,070
0
05 Oct 2022
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
202
806
0
13 Sep 2019
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
404
2,576
0
03 Sep 2019
1