SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

16 August 2018

Yejin Choi

Papers citing "SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference"

50 / 476 papers shown

Title
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies Yunquan Zhang Daning Cheng Yunquan Zhang Meiqi Tu Fangmin Liu Jiake Tian 178 2 0 24 Dec 2025
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing Leonie Bossemeyer Samuel Heinrich Grant Van Horn Oisin Mac Aodha 84 0 0 11 Nov 2025
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale J. Michaelov Roger P. Levy Benjamin Bergen AI4TS 116 0 0 28 Oct 2025
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM Areej AlOtaibi Lina Alyahya Raghad Alshabanah Shahad Alfawzan Shuruq Alarefei ... Waad Alahmed Omar Talabay Jalal Alowibdi Salem Alelyani Adel Bibi 173 0 0 15 Oct 2025
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning Mayank Ravishankara Varindra V. Persad Maharaj ELM 149 1 0 05 Oct 2025
What is a protest anyway? Codebook conceptualization is still a first-order concern in LLM-era classification Andrew Halterman Katherine A. Keith 118 0 0 03 Oct 2025
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts Yuan Zhuang Yi Shen Yuexin Bian Qing Su Shihao Ji Yuanyuan Shi Fei Miao MoE MoMe 204 1 0 30 Sep 2025
Training-free Truthfulness Detection via Value Vectors in LLMs Runheng Liu Heyan Huang Xingchen Xiao Zhijing Wu 88 0 0 22 Sep 2025
LLM-Guided Co-Training for Text Classification Md Mezbaur Rahman Cornelia Caragea 81 0 0 20 Sep 2025
MatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis Yonghao Weng Liqiang Gao Linwu Zhu Jian Huang 116 0 0 14 Sep 2025
Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context Dasol Choi Jungwhan Kim Guijin Son LRM 171 0 0 14 Sep 2025
Simulation Priors for Data-Efficient Deep Learning Lenart Treven Bhavya Sukhija Jonas Rothfuss Stelian Coros Florian Dorfler Andreas Krause 108 0 0 06 Sep 2025
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap Jun Wang Ninglun Gu Kailai Zhang Zijiao Zhang Yelun Bao ... Liwei Liu Yihuan Liu Pengyong Li Gary G. Yen Junchi Yan ALM ELM 216 0 0 26 Aug 2025
HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator Chukwufumnanya Ogbogu Gaurav Narang B. K. Joardar J. Doppa Krishnendu Chakrabarty P. Pande 88 0 0 22 Aug 2025
LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration Jizhou Guo 103 0 0 31 Jul 2025
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable eventsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 J. Michaelov Reeka Estacio Zhien Zhang Benjamin Bergen ReLM LRM 186 1 0 07 Jun 2025
Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition Tara Azin Daniel Dumitrescu Diana Inkpen Raj Singh 145 1 0 06 Jun 2025
Small-to-Large Generalization: Data Influences Models Consistently Across Scale Alaa Khaddaj Logan Engstrom Aleksander Madry TDI AI4CE 265 0 0 22 May 2025
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge SubtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Mohammadtaha Bagherifard Sahar Rajabi Ali Edalat Yadollah Yaghoobzadeh KELM 250 0 0 16 May 2025
IterKey: Iterative Keyword Generation with LLMs for Enhanced Retrieval Augmented Generation Kazuki Hayashi Hidetaka Kamigaito Shinya Kouda Taro Watanabe RALM 314 3 0 13 May 2025
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors Nicy Scaria Silvester John Joseph Kennedy Diksha Seth Ananya Thakur Deepak N. Subramani AI4Ed 273 0 0 02 May 2025
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts Pradip Kunwar Minh Vu Maanak Gupta Mahmoud Abdelsalam Manish Bhattarai MoE MoMe 913 1 0 29 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks Yixin Cao Shibo Hong Xuzhao Li Jiahao Ying Yubo Ma ... Juanzi Li Aixin Sun Qi Zhang Tat-Seng Chua Tianwei Zhang ALM ELM 500 21 0 26 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic InsightsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 Abhinav Joshi A. Ahmad Divyaksh Shukla Ashutosh Modi ReLM LRM 235 4 0 14 Apr 2025
Data Caricatures: On the Representation of African American Language in Pretraining CorporaAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Nicholas Deas Blake Vente Amith Ananthram Jessica A. Grieser D. Patton Shana Kleiner James Shepard Kathleen McKeown 253 0 0 13 Mar 2025
MetaXCR: Reinforcement-Based Meta-Transfer Learning for Cross-Lingual Commonsense Reasoning Jie He Yu Fu OffRL LRM 301 2 0 09 Mar 2025
Continual Pre-training of MoEs: How robust is your router? Benjamin Thérien Charles-Étienne Joseph Zain Sarwar Ashwinee Panda Anirban Das Shi-Xiong Zhang Stephen Rawls Siyang Song Eugene Belilovsky Irina Rish MoE 327 3 0 06 Mar 2025
The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation Jie He Tao Wang Deyi Xiong Qun Liu ELM LRM 354 34 0 05 Mar 2025
Token-Level Privacy in Large Language Models Reém Harel Niv Gilboa Yuval Pinter 185 0 0 05 Mar 2025
Slamming: Training a Speech Language Model on One GPU in a DayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Gallil Maimon Avishai Elmakies Yossi Adi 287 9 0 19 Feb 2025
Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language UnderstandingInternational Conference on Computational Linguistics (COLING), 2025 Binh-Nguyen Nguyen Yang He 235 2 0 05 Jan 2025
Federated Heavy Hitter Analytics with Local Differential Privacy Yuemin Zhang Qingqing Ye Haibo Hu FedML 434 2 0 03 Jan 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven OptimizationAAAI Conference on Artificial Intelligence (AAAI), 2024 Yue Zhang Liqiang Jing Vibhav Gogate 385 12 0 19 Dec 2024
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models Tian Yu Shaolei Zhang Yang Feng RALM 3DV AIFin LRM 309 24 0 29 Nov 2024
What Really is Commonsense Knowledge? Quyet V. Do Junze Li Tung-Duong Vuong Zhaowei Wang Yangqiu Song Xiaojuan Ma 188 2 0 06 Nov 2024
Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning Dong Shu Jundong Li 168 3 0 30 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen Shun Zhang Jilong Wu Zhiping Xiu Ehab AlBadawy Yiting Lu M. Seltzer Qing He 169 6 0 27 Oct 2024
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.SConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Christabel Acquaye Haozhe An Rachel Rudinger 217 7 0 21 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Shailaja Keyur Sampat Yezhou Yang Chitta Baral LM&Ro 161 1 0 17 Oct 2024
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights Odysseas S. Chlapanis D. Galanis Ion Androutsopoulos AILaw ELM 174 2 0 17 Oct 2024
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024 Zheng Yi Ho Yaning Tan Sen Zhang Yibing Zhan Dacheng Tao 243 5 0 11 Oct 2024
Precise Model Benchmarking with Only a Few ObservationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Riccardo Fogliato Pratik Patil Nil-Jana Akpinar Mathew Monfort 178 1 0 07 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning Shashank Yadav Rohan Tomar Garvit Jain Chirag Ahooja Shubham Chaudhary Charles Elkan 274 1 0 05 Oct 2024
The Hard Positive Truth about Vision-Language CompositionalityEuropean Conference on Computer Vision (ECCV), 2024 Amita Kamath Cheng-Yu Hsieh Kai-Wei Chang Ranjay Krishna CLIP CoGe VLM 212 13 0 26 Sep 2024
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Neeladri Bhuiya Viktor Schlegel Stefan Winkler LRM 200 9 0 08 Sep 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical ResearcherItalian National Conference on Sensors (INS), 2024 Derry Pratama Naufal Suryanto Andro Aprila Adiputra Thi-Thu-Huong Le Ahmada Yusril Kadiptya Muhammad Iqbal Howon Kim 178 18 0 21 Aug 2024
SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex EventsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Sai Vallurupalli Katrin Erk Francis Ferraro 152 3 0 11 Aug 2024
Data Contamination Report from the 2024 CONDA Shared Task Oscar Sainz Iker García-Ferrero Alon Jacovi Jonas Hanselle Yanai Elazar ... Yu-Min Tseng Vishaal Udandarao Zengzhi Wang Ruijie Xu Jinglin Yang 259 13 0 31 Jul 2024
MUSCLE: A Model Update Strategy for Compatible LLM Evolution Jessica Echterhoff Fartash Faghri Raviteja Vemulapalli Ting-Yao Hu Chun-Liang Li Oncel Tuzel Hadi Pouransari KELM 175 7 0 12 Jul 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices Juntao Zhao Borui Wan Size Zheng Haibin Lin Yibo Zhu Chuan Wu 176 3 0 02 Jul 2024