ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00998
  4. Cited By
Predictions from language models for multiple-choice tasks are not
  robust under variation of scoring methods

Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods

1 March 2024
Polina Tsvilodub
Hening Wang
Sharon Grosch
Michael Franke
ArXivPDFHTML

Papers citing "Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods"

14 / 14 papers shown
Title
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen
Tiffany Knearem
Reshmi Ghosh
Yu-Ju Yang
Tanushree Mitra
Yun Huang
Yun Huang
50
0
0
15 Sep 2024
Compare without Despair: Reliable Preference Evaluation with Generation
  Separability
Compare without Despair: Reliable Preference Evaluation with Generation Separability
Sayan Ghosh
Tejas Srinivasan
Swabha Swayamdipta
35
2
0
02 Jul 2024
Bayesian Statistical Modeling with Predictors from LLMs
Bayesian Statistical Modeling with Predictors from LLMs
Michael Franke
Polina Tsvilodub
Fausto Carcassi
34
4
0
13 Jun 2024
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single
  Process
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
Ermo Hua
Biqing Qi
Kaiyan Zhang
Yue Yu
Ning Ding
Xingtai Lv
Kai Tian
Bowen Zhou
32
3
0
20 May 2024
Look at the Text: Instruction-Tuned Language Models are More Robust
  Multiple Choice Selectors than You Think
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang
Chengzhi Hu
Bolei Ma
Paul Röttger
Barbara Plank
OOD
24
6
0
12 Apr 2024
Auxiliary task demands mask the capabilities of smaller language models
Auxiliary task demands mask the capabilities of smaller language models
Jennifer Hu
Michael C. Frank
ELM
29
25
0
03 Apr 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
251
2,232
0
22 Mar 2023
Syntactic Surprisal From Neural Models Predicts, But Underestimates,
  Human Processing Difficulty From Syntactic Ambiguities
Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities
Suhas Arehalli
Brian Dillon
Tal Linzen
26
36
0
21 Oct 2022
Using cognitive psychology to understand GPT-3
Using cognitive psychology to understand GPT-3
Marcel Binz
Eric Schulz
ELM
LLMAG
242
439
0
21 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,909
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,448
0
28 Jan 2022
Reducing conversational agents' overconfidence through linguistic
  calibration
Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke
Arthur Szlam
Emily Dinan
Y-Lan Boureau
209
153
0
30 Dec 2020
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
1