Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.08952
Cited By
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
15 June 2023
Qingyu Tan
Hwee Tou Ng
Lidong Bing
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models"
19 / 19 papers shown
Title
TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague, Implicit and Explicit References
Svenja Kenneweg
J. Deigmöller
Philipp Cimiano
Julian Eggert
44
0
0
02 May 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
54
0
0
21 Jan 2025
Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning
Jiapu Wang
Kai Sun
Linhao Luo
Wei Wei
Yongli Hu
Alan Wee-Chung Liew
Shirui Pan
Baocai Yin
47
5
0
31 Dec 2024
RARe: Retrieval Augmented Retrieval with In-Context Examples
Atula Tejaswi
Yoonsang Lee
Sujay Sanghavi
Eunsol Choi
RALM
LRM
25
1
0
26 Oct 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
ALM
ELM
35
6
0
16 Aug 2024
LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models
Weizhi Tang
Vaishak Belle
LRM
42
1
0
07 Jul 2024
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su
Jun Zhang
Tong Zhu
Xiaoye Qu
Juntao Li
Min Zhang
Yu Cheng
LRM
47
17
0
20 Jun 2024
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi
Mehran Kazemi
Anton Tsitsulin
Karishma Malkan
Jinyeong Yim
John Palowitch
Sungyong Seo
Jonathan J. Halcrow
Bryan Perozzi
LRM
35
26
0
13 Jun 2024
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
Weiqi Wang
Yangqiu Song
LRM
35
8
0
04 Jun 2024
RAR-b: Reasoning as Retrieval Benchmark
Chenghao Xiao
G. Thomas
Al Moubayed
LRM
RALM
29
8
0
09 Apr 2024
A Theory for Length Generalization in Learning to Reason
Changnan Xiao
Bing Liu
LRM
34
8
0
31 Mar 2024
Conditions for Length Generalization in Learning Reasoning Skills
Changnan Xiao
Bing Liu
LRM
32
7
0
22 Nov 2023
MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models
Yifan Wei
Yisong Su
Huanhuan Ma
Xiaoyan Yu
Fangyu Lei
Yuanzhe Zhang
Jun Zhao
Kang Liu
LRM
17
9
0
08 Oct 2023
TRAM: Benchmarking Temporal Reasoning for Large Language Models
Yuqing Wang
Yun Zhao
LRM
23
8
0
02 Oct 2023
In-context Interference in Chat-based Large Language Models
Eric Nuertey Coleman
J. Hurtado
Vincenzo Lomonaco
KELM
20
1
0
22 Sep 2023
An Overview Of Temporal Commonsense Reasoning and Acquisition
Georg Wenzel
Adam Jatowt
ReLM
LRM
20
9
0
28 Jul 2023
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Xingxuan Li
Liying Cheng
Qingyu Tan
Hwee Tou Ng
Shafiq R. Joty
Lidong Bing
LRM
AI4CE
25
0
0
24 May 2023
StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models
Adam Livska
Tomávs Kovciský
E. Gribovskaya
Tayfun Terzi
Eren Sezener
...
Susannah Young
Ellen Gilsenan-McMahon
Sophia Austin
Phil Blunsom
Angeliki Lazaridou
KELM
232
90
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,909
0
04 Mar 2022
1