v1v2 (latest)
Multimodal Datasets and Benchmarks for Reasoning about Dynamic
Spatio-Temporality in Everyday Environments
- VGen
Main:2 Pages
1 Figures
Bibliography:3 Pages
1 Tables
Abstract
We used a 3D simulator to create artificial video data with standardized annotations, aiming to aid in the development of Embodied AI. Our question answering (QA) dataset measures the extent to which a robot can understand human behavior and the environment in a home setting. Preliminary experiments suggest our dataset is useful in measuring AI's comprehension of daily life. \end{abstract}
View on arXivComments on this paper
