FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025 |
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 |
EgoLife: Towards Egocentric Life AssistantComputer Vision and Pattern Recognition (CVPR), 2025 |
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingInternational Conference on Learning Representations (ICLR), 2025 |