Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.03929
Cited By
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
8 October 2022
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EgoTaskQA: Understanding Human Tasks in Egocentric Videos"
15 / 15 papers shown
Title
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang
Rishabh Dabral
D. Luvizon
Zhe Cao
Lingjie Liu
Thabo Beeler
Christian Theobalt
EgoV
45
0
0
11 Apr 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
74
3
0
26 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
52
0
0
30 Jan 2025
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
85
5
0
05 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
53
25
0
10 Oct 2024
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models
Lingfeng Zhang
Yuening Wang
Hongjian Gu
Atia Hamidizadeh
Zhanguang Zhang
...
Tongtong Cao
Yuzheng Zhuang
Yingxue Zhang
Jianye Hao
Jianye Hao
LM&Ro
31
0
0
02 Oct 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
60
38
0
23 May 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
31
1
0
01 Apr 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
49
32
0
24 Mar 2024
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
20
38
0
31 Mar 2023
On the Learning Mechanisms in Physical Reasoning
Shiqian Li
Ke Wu
Chi Zhang
Yixin Zhu
AI4CE
34
13
0
05 Oct 2022
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
218
682
0
13 Oct 2021
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
145
179
0
01 Oct 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
20
14
0
07 Jan 2021
1