ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.05326
  4. Cited By
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense
  Inference

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

16 August 2018
Rowan Zellers
Yonatan Bisk
Roy Schwartz
Yejin Choi
ArXiv (abs)PDFHTML

Papers citing "SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference"

50 / 475 papers shown
Title
A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese
A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese
Tin Van Huynh
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
260
2
0
25 Jun 2024
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
Xunzhi Wang
Zhuowei Zhang
Qiongyu Li
Gaonan Chen
Mengting Hu
Zhixin Han
Bitong Luo
Zhiyu li
Hang Gao
Mengting Hu
ELM
324
3
0
18 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
255
9
0
15 Jun 2024
BlockPruner: Fine-grained Pruning for Large Language Models
BlockPruner: Fine-grained Pruning for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Longguang Zhong
Fanqi Wan
Ruijun Chen
Xiaojun Quan
Liangzhi Li
257
15
0
15 Jun 2024
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation
  Strategy by Language Models and Humans
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and HumansAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
188
6
0
06 Jun 2024
Every Answer Matters: Evaluating Commonsense with Probabilistic Measures
Every Answer Matters: Evaluating Commonsense with Probabilistic MeasuresAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Qi Cheng
Michael Boratko
Pranay Kumar Yelugam
T. O’Gorman
Nalini Singh
Andrew McCallum
X. Li
ELMLRM
214
6
0
06 Jun 2024
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless PredictionsACM Multimedia (MM), 2024
Junzhang Liu
Zhecan Wang
Hammad A. Ayyubi
Haoxuan You
Chris Thomas
Rui Sun
Shih-Fu Chang
Kai-Wei Chang
488
0
0
18 May 2024
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering
  System for Commonsense Defying Reasoning
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying ReasoningInternational Workshop on Semantic Evaluation (SemEval), 2024
Mina Ghashami
Soumya Smruti Mishra
LRM
212
1
0
16 May 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World
  Knowledge
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRMRALM
222
29
0
15 May 2024
Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of
  Large Language Models
Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
Emre Onal
Klemens Flöge
Emma Caldwell
A. Sheverdin
Vincent Fortuin
UQCVBDL
252
13
0
06 May 2024
Semi-supervised Text-based Person Search
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
174
2
0
28 Apr 2024
How often are errors in natural language reasoning due to paraphrastic
  variability?
How often are errors in natural language reasoning due to paraphrastic variability?
Neha Srikanth
Marine Carpuat
Rachel Rudinger
LRM
191
4
0
17 Apr 2024
Improving Language Model Reasoning with Self-motivated Learning
Improving Language Model Reasoning with Self-motivated LearningInternational Conference on Language Resources and Evaluation (LREC), 2024
Yunlong Feng
Yang Xu
Libo Qin
Yasheng Wang
Wanxiang Che
LRMReLM
184
8
0
10 Apr 2024
uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers?
uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers?International Workshop on Semantic Evaluation (SemEval), 2024
Pouya Sadeghi
Amirhossein Abaskohi
Yadollah Yaghoobzadeh
LRMReLM
173
2
0
03 Apr 2024
Improving Adversarial Data Collection by Supporting Annotators: Lessons
  from GAHD, a German Hate Speech Dataset
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher
Paul Röttger
Gerold Schneider
AAML
168
15
0
28 Mar 2024
Bridging the Sim-to-Real Gap with Bayesian Inference
Bridging the Sim-to-Real Gap with Bayesian Inference
Jonas Rothfuss
Bhavya Sukhija
Lenart Treven
Florian Dorfler
Stelian Coros
Andreas Krause
AI4CE
278
10
0
25 Mar 2024
PARADISE: Evaluating Implicit Planning Skills of Language Models with
  Procedural Warnings and Tips Dataset
PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset
Arda Uzunouglu
Abdalfatah Rashid Safa
Gözde Gül Sahin
LRM
160
3
0
05 Mar 2024
Unsupervised multiple choices question answering via universal corpus
Unsupervised multiple choices question answering via universal corpus
Qin Zhang
Hao Ge
Xiaojun Chen
Menglu Fang
OffRL
198
2
0
27 Feb 2024
Cleaner Pretraining Corpus Curation with Neural Web Scraping
Cleaner Pretraining Corpus Curation with Neural Web Scraping
Zhipeng Xu
Zhenghao Liu
Shi Yu
Zhiyuan Liu
Ge Yu
Chenyan Xiong
CLIPOnRL
226
9
0
22 Feb 2024
Rule or Story, Which is a Better Commonsense Expression for Talking with
  Large Language Models?
Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?
Ning Bian
Xianpei Han
Hongyu Lin
Yaojie Lu
Xianpei Han
Le Sun
217
2
0
22 Feb 2024
EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human
  Adversaries
EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries
Jing Han Sun
Ali Emami
224
6
0
20 Feb 2024
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou
Simon S. Du
Beibin Li
OffRL
195
9
0
20 Feb 2024
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text
Sayantan Adak
Daivik Agrawal
Animesh Mukherjee
Somak Aditya
267
5
0
20 Feb 2024
Beyond the Answers: Reviewing the Rationality of Multiple Choice
  Question Answering for the Evaluation of Large Language Models
Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models
Hao Wang
Sendong Zhao
Zewen Qiang
Nuwa Xi
Bing Qin
Ting Liu
LRMELM
62
7
0
02 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
262
14
0
25 Jan 2024
From Text to Multimodal: A Comprehensive Survey of Adversarial Example
  Generation in Question Answering Systems
From Text to Multimodal: A Comprehensive Survey of Adversarial Example Generation in Question Answering Systems
Gulsum Yigit
M. Amasyalı
AAML
137
0
0
26 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
312
32
0
01 Dec 2023
Explanatory Argument Extraction of Correct Answers in Resident Medical
  Exams
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams
Iakes Goenaga
Aitziber Atutxa
Koldo Gojenola
Maite Oronoz
Rodrigo Agerri
ELM
191
9
0
01 Dec 2023
Robot Learning in the Era of Foundation Models: A Survey
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CELM&Ro
352
44
0
24 Nov 2023
MacGyver: Are Large Language Models Creative Problem Solvers?
MacGyver: Are Large Language Models Creative Problem Solvers?North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yufei Tian
Abhilasha Ravichander
Lianhui Qin
Ronan Le Bras
Raja Marjieh
Nanyun Peng
Yejin Choi
Thomas Griffiths
Faeze Brahman
AI4CELLMAG
346
26
0
16 Nov 2023
Measuring Adversarial Datasets
Measuring Adversarial Datasets
Yuanchen Bai
Raoyi Huang
Vijay Viswanathan
Tzu-Sheng Kuo
Tongshuang Wu
222
1
0
06 Nov 2023
Learning to Play Chess from Textbooks (LEAP): a Corpus for Evaluating
  Chess Moves based on Sentiment Analysis
Learning to Play Chess from Textbooks (LEAP): a Corpus for Evaluating Chess Moves based on Sentiment Analysis
Haifa Alrdahi
Riza Batista-Navarro
147
2
0
31 Oct 2023
Break it, Imitate it, Fix it: Robustness by Generating Human-Like
  Attacks
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks
Aradhana Sinha
Ananth Balashankar
Ahmad Beirami
Thi Avrahami
Jilin Chen
Alex Beutel
AAML
172
6
0
25 Oct 2023
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mete Ismayilzada
Debjit Paul
Syrielle Montariol
Mor Geva
Antoine Bosselut
LRM
184
7
0
23 Oct 2023
TeleQnA: A Benchmark Dataset to Assess Large Language Models
  Telecommunications Knowledge
TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge
Ali Maatouk
Fadhel Ayed
Nicola Piovesan
Antonio De Domenico
Merouane Debbah
Zhi-Quan Luo
158
69
0
23 Oct 2023
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
  Zero-Shot Commonsense Question Answering
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haochen Shi
Weiqi Wang
Tianqing Fang
Baixuan Xu
Wenxuan Ding
Xin Liu
Yangqiu Song
196
7
0
17 Oct 2023
Domain Generalization Using Large Pretrained Models with
  Mixture-of-Adapters
Domain Generalization Using Large Pretrained Models with Mixture-of-AdaptersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Gyuseong Lee
Wooseok Jang
Jin Hyeon Kim
Jaewoo Jung
Seungryong Kim
MoEOOD
160
9
0
17 Oct 2023
Data Contamination Through the Lens of Time
Data Contamination Through the Lens of Time
Manley Roberts
Himanshu Thakur
Christine Herlihy
Colin White
Samuel Dooley
240
37
0
16 Oct 2023
PHALM: Building a Knowledge Graph from Scratch by Prompting Humans and a
  Language Model
PHALM: Building a Knowledge Graph from Scratch by Prompting Humans and a Language Model
Tatsuya Ide
Eiki Murata
Daisuke Kawahara
T. Yamazaki
Shengzhe Li
K. Shinzato
Toshinori Sato
LRM
217
2
0
11 Oct 2023
NEWTON: Are Large Language Models Capable of Physical Reasoning?
NEWTON: Are Large Language Models Capable of Physical Reasoning?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yi Ru Wang
Jiafei Duan
Dieter Fox
S. Srinivasa
ELMLRMAIMatReLM
228
50
0
10 Oct 2023
Empower Nested Boolean Logic via Self-Supervised Curriculum Learning
Empower Nested Boolean Logic via Self-Supervised Curriculum LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hongqiu Wu
Linfeng Liu
Haizhen Zhao
Min Zhang
LRMAI4CENAIELM
216
8
0
09 Oct 2023
Retrieval-Generation Synergy Augmented Large Language Models
Retrieval-Generation Synergy Augmented Large Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhangyin Feng
Xiaocheng Feng
Dezhi Zhao
Maojin Yang
Bing Qin
LRMRALM
161
43
0
08 Oct 2023
Crystal: Introspective Reasoners Reinforced with Self-Feedback
Crystal: Introspective Reasoners Reinforced with Self-FeedbackConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hamish Ivison
Ramakanth Pasunuru
Hannaneh Hajishirzi
Yejin Choi
Asli Celikyilmaz
LRMReLM
176
29
0
07 Oct 2023
Inferring Capabilities from Task Performance with Bayesian Triangulation
Inferring Capabilities from Task Performance with Bayesian Triangulation
John Burden
Konstantinos Voudouris
Ryan Burnell
Danaja Rutar
Lucy G. Cheke
José Hernández-Orallo
124
10
0
21 Sep 2023
Mitigating Shortcuts in Language Models with Soft Label Encoding
Mitigating Shortcuts in Language Models with Soft Label EncodingInternational Conference on Language Resources and Evaluation (LREC), 2023
Zirui He
Huiqi Deng
Haiyan Zhao
Ninghao Liu
Jundong Li
120
2
0
17 Sep 2023
Benchmarking Procedural Language Understanding for Low-Resource
  Languages: A Case Study on Turkish
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Arda Uzunouglu
Gözde Gül Sahin
162
6
0
13 Sep 2023
AGent: A Novel Pipeline for Automatically Creating Unanswerable
  Questions
AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions
Son Quoc Tran
Gia-Huy Do
Phong Nguyen-Thuan Do
Matt Kretchmar
Xinya Du
234
0
0
10 Sep 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
  Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language VariantsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
264
223
0
31 Aug 2023
A Survey on Out-of-Distribution Evaluation of Neural NLP Models
A Survey on Out-of-Distribution Evaluation of Neural NLP ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Xinzhe Li
Ming Liu
Shang Gao
Wray Buntine
169
24
0
27 Jun 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language
  Compositionality
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language CompositionalityNeural Information Processing Systems (NeurIPS), 2023
Cheng-Yu Hsieh
Jieyu Zhang
Zixian Ma
Aniruddha Kembhavi
Ranjay Krishna
CoGe
245
181
0
26 Jun 2023
Previous
12345...8910
Next