"Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

6 September 2019

Daniel Khashabi

Papers citing ""Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding"

50 / 134 papers shown

Structured yet Bounded Temporal Understanding in Large Language Models

Damin Zhang

Julia Taylor Rayz

217

19 Oct 2025

Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

211

01 Oct 2025

AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios

204

27 Aug 2025

TComQA: Extracting Temporal Commonsense from Text

135

21 Aug 2025

MobQA: A Benchmark Dataset for Semantic Understanding of Human Mobility Data through Question Answering

213

15 Aug 2025

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

254

01 Jul 2025

Chaining Event Spans for Temporal Relation GroundingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2025

212

17 Jun 2025

From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents

409

17 Jun 2025

LexTime: A Benchmark for Temporal Ordering of Legal EventsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

392

04 Jun 2025

Around the World in 24 Hours: Probing LLM Knowledge of Time and PlaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

341

04 Jun 2025

CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer

272

30 May 2025

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

246

21 May 2025

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

...

595

26 Apr 2025

Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

506

07 Apr 2025

WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization

393

31 Mar 2025

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying ProbesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

423

22 Mar 2025

A Study into Investigating Temporal Robustness of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

312

21 Mar 2025

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Subhash Kantamneni

Joshua Engels

Senthooran Rajamanoharan

Max Tegmark

Neel Nanda

472

23 Feb 2025

Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Jongho Kim

Seung-won Hwang

LRM AI4CE

565

17 Feb 2025

TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session DialoguesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

819

03 Feb 2025

Weak-to-Strong Generalization Through the Data-Centric LensInternational Conference on Learning Representations (ICLR), 2024

Changho Shin

John Cooper

Frederic Sala

581

05 Dec 2024

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions

Shailaja Keyur Sampat

Yezhou Yang

Chitta Baral

LM&Ro

267

17 Oct 2024

MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language ModelsACM Multimedia (MM), 2024

Yang Yang

273

08 Aug 2024

A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

372

16 Jul 2024

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

574

03 Jul 2024

Timo: Towards Better Temporal Reasoning for Language Models

Zhaochen Su

Jun Zhang

Tong Zhu

Xiaoye Qu

Juntao Li

Min Zhang

Yu Cheng

LRM

333

20 Jun 2024

Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

Zhi Wang

Wenwu Zhu

392

15 Jun 2024

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

Zhaochen Su

Juntao Li

Jun Zhang

Tong Zhu

Xiaoye Qu

Pan Zhou

Yan Bowen

Yu Cheng

Min zhang

LRM

303

13 Jun 2024

Scaling and evaluating sparse autoencoders

326

387

06 Jun 2024

A Comprehensive Evaluation on Event Reasoning of Large Language Models

279

26 Apr 2024

Continual Learning of Large Language Models: A Comprehensive Survey

475

230

25 Apr 2024

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

628

23 Apr 2024

EVIT: Event-Oriented Instruction Tuning for Event Reasoning

310

18 Apr 2024

AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QA

Sadao Kurohashi

181

27 Mar 2024

Formulation Comparison for Timeline Construction using LLMs

341

01 Mar 2024

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Philip S. Yu

379

16 Feb 2024

Large Language Models Can Learn Temporal ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Siheng Xiong

Ali Payani

Ramana Rao Kompella

Faramarz Fekri

LRM

636

170

12 Jan 2024

Temporal Validity Change PredictionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Georg Wenzel

Adam Jatowt

312

01 Jan 2024

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

251

20 Dec 2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk Groeneveld

Anas Awadalla

Iz Beltagy

Akshita Bhagia

Ian H. Magnusson

Hao Peng

Oyvind Tafjord

Pete Walsh

Kyle Richardson

Jesse Dodge

291

15 Dec 2023

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak SupervisionInternational Conference on Machine Learning (ICML), 2023

...

524

434

14 Dec 2023

TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

495

29 Nov 2023

Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning

Qingyu Tan

Hwee Tou Ng

Lidong Bing

267

16 Nov 2023

Are Large Language Models Temporally Grounded?North American Chapter of the Association for Computational Linguistics (NAACL), 2023

394

14 Nov 2023

MTGER: Multi-view Temporal Graph Enhanced Temporal Reasoning over Time-Involved Document

277

08 Nov 2023

Mind the Gap Between Conversations for Improved Long-Term Dialogue GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Qiang Zhang

Jason Naradowsky

Yusuke Miyao

182

24 Oct 2023

CRoW: Benchmarking Commonsense Reasoning in Real-World TasksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

343

23 Oct 2023

How Much Consistency Is Your Accuracy Worth?

Jacob K. Johnson

Ana Marasović

170

20 Oct 2023

Instructive Dialogue Summarization with Query AggregationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Bin Wang

Zhengyuan Liu

Nancy F. Chen

439

17 Oct 2023

TRAM: Benchmarking Temporal Reasoning for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Yuqing Wang

Yun Zhao

LRM

346

02 Oct 2023