Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2109.03413
Cited By

YouRefIt: Embodied Reference Understanding with Language and Gesture

v1v2 (latest)

YouRefIt: Embodied Reference Understanding with Language and Gesture

IEEE International Conference on Computer Vision (ICCV), 2021

8 September 2021

ArXiv (abs)PDF HTML

Papers citing "YouRefIt: Embodied Reference Understanding with Language and Gesture"

13 / 13 papers shown

A Multimodal Depth-Aware Method For Embodied Reference Understanding

A Multimodal Depth-Aware Method For Embodied Reference Understanding

Fevziye Irem Eyiokur

Alexander Waibel

380

0

0

09 Oct 2025

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Learning to Generate Pointing Gestures in Situated Embodied Conversational AgentsFrontiers in Robotics and AI (Front. Robot. AI), 2023

Simon Alexanderson

256

15

0

15 Sep 2025

Multimodal Data Storage and Retrieval for Embodied AI: A Survey

Multimodal Data Storage and Retrieval for Embodied AI: A Survey

157

3

0

19 Aug 2025

CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding

CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding

Fevziye Irem Eyiokur

Alexander Waibel

301

0

0

29 Jul 2025

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Bulat Khaertdinov

Aslı Özyürek

Raquel Fernández

401

0

0

27 Feb 2025

GSVA: Generalized Segmentation via Multimodal Large Language Models

GSVA: Generalized Segmentation via Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023

Gao Huang

682

152

0

15 Dec 2023

Spatial and Visual Perspective-Taking via View Rotation and Relation
Reasoning for Embodied Reference Understanding

Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference UnderstandingEuropean Conference on Computer Vision (ECCV), 2023

195

14

0

03 Sep 2023

MEWL: Few-shot multimodal word learning with referential uncertainty

MEWL: Few-shot multimodal word learning with referential uncertaintyInternational Conference on Machine Learning (ICML), 2023

Guangyuan Jiang

348

29

0

01 Jun 2023

STRAP: Structured Object Affordance Segmentation with Point Supervision

STRAP: Structured Object Affordance Segmentation with Point Supervision

Hao Zhao

287

10

0

17 Apr 2023

ULN: Towards Underspecified Vision-and-Language Navigation

ULN: Towards Underspecified Vision-and-Language NavigationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

William Yang Wang

325

5

0

18 Oct 2022

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

HUMANISE: Language-conditioned Human Motion Generation in 3D ScenesNeural Information Processing Systems (NeurIPS), 2022

264

178

0

18 Oct 2022

Understanding Embodied Reference with Touch-Line Transformer

Understanding Embodied Reference with Touch-Line TransformerInternational Conference on Learning Representations (ICLR), 2022

Hao Zhao

Federico Rossano

366

21

0

11 Oct 2022

Distance-Aware Occlusion Detection with Focused Attention

Distance-Aware Occlusion Detection with Focused AttentionIEEE Transactions on Image Processing (IEEE TIP), 2022

Hao Zhao

205

9

0

23 Aug 2022

Page 1 of 1