Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.09170
Cited By
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
13 June 2024
Bahare Fatemi
Mehran Kazemi
Anton Tsitsulin
Karishma Malkan
Jinyeong Yim
John Palowitch
Sungyong Seo
Jonathan J. Halcrow
Bryan Perozzi
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning"
19 / 19 papers shown
Title
TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague, Implicit and Explicit References
Svenja Kenneweg
J. Deigmöller
Philipp Cimiano
Julian Eggert
40
0
0
02 May 2025
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
Jeffrey Li
Mohammadreza Armandpour
Iman Mirzadeh
Sachin Mehta
Vaishaal Shankar
...
Samy Bengio
Oncel Tuzel
Mehrdad Farajtabar
Hadi Pouransari
Fartash Faghri
CLL
KELM
59
0
0
02 Apr 2025
MastermindEval: A Simple But Scalable Reasoning Benchmark
Jonas Golde
Patrick Haller
Fabio Barth
Alan Akbik
LRM
ReLM
ELM
46
1
0
07 Mar 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
115
4
0
26 Feb 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
106
61
0
25 Nov 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
45
1
0
13 Oct 2024
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
David Herel
Vojtech Bartek
Jiri Jirak
Tomáš Mikolov
42
2
0
20 Sep 2024
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data
Jiaming Zhou
Abbas Ghaddar
Ge Zhang
Liheng Ma
Yaochen Hu
Soumyasundar Pal
Mark J. Coates
Bin Wang
Yingxue Zhang
Jianye Hao
ReLM
LRM
35
4
0
19 Sep 2024
LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models
Weizhi Tang
Vaishak Belle
LRM
32
1
0
07 Jul 2024
Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen
Ryan A. Chi
Xuezhi Wang
Denny Zhou
ReLM
LRM
33
26
0
14 Feb 2024
UGSL: A Unified Framework for Benchmarking Graph Structure Learning
Bahare Fatemi
Sami Abu-El-Haija
Anton Tsitsulin
Seyed Mehran Kazemi
Dustin Zelle
Neslihan Bulut
Jonathan J. Halcrow
Bryan Perozzi
46
9
0
21 Aug 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
200
2,232
0
22 Mar 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models
Adam Livska
Tomávs Kovciský
E. Gribovskaya
Tayfun Terzi
Eren Sezener
...
Susannah Young
Ellen Gilsenan-McMahon
Sophia Austin
Phil Blunsom
Angeliki Lazaridou
KELM
220
89
0
23 May 2022
GraphWorld: Fake Graphs Bring Real Insights for GNNs
John Palowitch
Anton Tsitsulin
Brandon Mayer
Bryan Perozzi
GNN
180
68
0
28 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Complex Temporal Question Answering on Knowledge Graphs
Zhen Jia
Soumajit Pramanik
Rishiraj Saha Roy
G. Weikum
216
103
0
18 Sep 2021
Temporal Reasoning on Implicit Events from Distant Supervision
Ben Zhou
Kyle Richardson
Qiang Ning
Tushar Khot
Ashish Sabharwal
Dan Roth
147
73
0
24 Oct 2020
Grale: Designing Networks for Graph Learning
Jonathan J. Halcrow
A. Mosoi
Sam Ruth
Bryan Perozzi
GNN
63
43
0
23 Jul 2020
1