Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.16337
Cited By
Task Contamination: Language Models May Not Be Few-Shot Anymore
26 December 2023
Changmao Li
Jeffrey Flanigan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Task Contamination: Language Models May Not Be Few-Shot Anymore"
19 / 19 papers shown
Title
Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models
Matthew Dahl
AILaw
ELM
45
0
0
05 May 2025
Large language models could be rote learners
Yuyang Xu
Renjun Hu
Haochao Ying
J. Wu
Xing Shi
Wei Lin
ELM
39
0
0
11 Apr 2025
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
Qiming Bao
Gael Gendron
A. Peng
Wanjun Zhong
N. Tan
Yang Chen
Michael Witbrock
J. Liu
LRM
ELM
48
2
0
20 Jan 2025
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun
Mark Rothermel
Marcus Rohrbach
Anna Rohrbach
74
1
0
13 Dec 2024
Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Hoang Nguyen
Khyati Mahajan
Vikas Yadav
Philip S. Yu
Masoud Hashemi
Rishabh Maheshwary
Rishabh Maheshwary
31
0
0
04 Nov 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
29
3
0
24 Oct 2024
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
31
1
0
04 Oct 2024
Federated Instruction Tuning of LLMs with Domain Coverage Augmentation
Zezhou Wang
Yaxin Du
Zhuzhong Qian
Yugang Jiang
Zhuzhong Qian
Siheng Chen
FedML
30
0
0
30 Sep 2024
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
KELM
34
1
0
14 Sep 2024
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
42
6
1
10 Jul 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OOD
MLT
OODD
44
2
0
05 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
36
24
0
04 Jun 2024
Had enough of experts? Quantitative knowledge retrieval from large language models
David Selby
Kai Spriestersbach
Yuichiro Iwashita
Dennis Bappert
Archana Warrier
Sumantrak Mukherjee
M. Asim
Koichi Kise
Sebastian Vollmer
23
0
0
12 Feb 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
91
89
0
03 Nov 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Kent K. Chang
Mackenzie Cramer
Sandeep Soni
David Bamman
RALM
132
77
0
28 Apr 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
242
460
0
06 Jan 2021
1