Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.00758
Cited By
There is a Time and Place for Reasoning Beyond the Image
1 March 2022
Xingyu Fu
Ben Zhou
I. Chandratreya
Carl Vondrick
Dan Roth
Re-assign community
ArXiv
PDF
HTML
Papers citing
"There is a Time and Place for Reasoning Beyond the Image"
20 / 20 papers shown
Title
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
LRM
45
0
0
16 Apr 2025
LiveVQA: Live Visual Knowledge Seeking
Mingyang Fu
Yuyang Peng
Benlin Liu
Yao Wan
Danny Chen
28
0
0
07 Apr 2025
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving
Xianda Guo
Ruijun Zhang
Yiqun Duan
Yuhang He
Chenming Zhang
Shuai Liu
Long Chen
LRM
91
11
0
20 Nov 2024
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
Sihang Zhao
Youliang Yuan
Xiaoying Tang
Pinjia He
38
3
0
15 Oct 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
Runsheng Huang
Liam Dugan
Yuqing Yang
Chris Callison-Burch
44
3
0
11 Oct 2024
NL-Eye: Abductive NLI for Images
Mor Ventura
Michael Toker
Nitay Calderon
Zorik Gekhman
Yonatan Bitton
Roi Reichart
28
1
0
03 Oct 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
50
47
0
13 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
31
15
0
11 Jun 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLM
LRM
MLLM
43
110
0
18 Apr 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
36
3
0
28 Feb 2024
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
58
3
0
22 Oct 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
27
10
0
12 Jul 2023
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Xingyu Fu
Shenmin Zhang
Gukyeong Kwon
Pramuditha Perera
Henghui Zhu
...
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Dan Roth
Bing Xiang
29
19
0
30 May 2023
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
Xingyu Fu
Ben Zhou
Sihao Chen
Mark Yatskar
Dan Roth
LRM
28
0
0
24 May 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-jui Fu
Wenhu Chen
Luu Anh Tuan
VLM
48
9
0
23 May 2023
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning
Weimin Shi
Mingchen Zhuge
D. Gao
Zhong Zhou
Ming-Ming Cheng
Deng-Ping Fan
LRM
VLM
23
0
0
02 Feb 2023
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
46
6
0
15 Sep 2022
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
109
168
0
28 Sep 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
208
310
0
02 Mar 2021
Temporal Reasoning on Implicit Events from Distant Supervision
Ben Zhou
Kyle Richardson
Qiang Ning
Tushar Khot
Ashish Sabharwal
Dan Roth
170
73
0
24 Oct 2020
1