ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01582
  4. Cited By
Object Referring in Videos with Language and Human Gaze
v1v2 (latest)

Object Referring in Videos with Language and Human Gaze

4 January 2018
A. Vasudevan
Dengxin Dai
Luc Van Gool
    VOS
ArXiv (abs)PDFHTML

Papers citing "Object Referring in Videos with Language and Human Gaze"

50 / 50 papers shown
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
Anna Deichler
Jonas Beskow
VGen
151
0
0
26 Oct 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
Kunyu Peng
Di Wen
Jia Fu
Jiamin Wu
Kailun Yang
...
Yufan Chen
Yuqian Fu
D. Paudel
Luc Van Gool
Rainer Stiefelhagen
133
0
0
18 Oct 2025
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Lucas Chen
Guna Avula
Hanwen Ren
Zixing Wang
A. H. Qureshi
118
1
0
05 Aug 2025
RefAV: Towards Planning-Centric Scenario Mining
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
407
7
0
27 May 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
Tian Jin
Guang Chen
Yanfeng Wang
Yujiao Shi
409
4
0
18 Mar 2025
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
216
3
0
17 Oct 2024
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour
David Harrison
Maxwell Horton
Jeffrey Marker
Houman Bedayat
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
Saman Naderiparizi
MQ
363
7
0
14 Oct 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human AttentionEuropean Conference on Computer Vision (ECCV), 2024
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
409
3
0
28 Jul 2024
Bootstrapping Referring Multi-Object Tracking
Bootstrapping Referring Multi-Object Tracking
Yani Zhang
Dongming Wu
Wencheng Han
Xingping Dong
382
21
0
07 Jun 2024
MLS-Track: Multilevel Semantic Interaction in RMOT
MLS-Track: Multilevel Semantic Interaction in RMOT
Zeliang Ma
Yang Song
Zhe Cui
Zhicheng Zhao
Fei Su
Delong Liu
Jingyu Wang
211
8
0
18 Apr 2024
Spatio-Temporal Attention and Gaussian Processes for Personalized Video
  Gaze Estimation
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
Swati Jindal
Mohit Yadav
Roberto Manduchi
165
12
0
08 Apr 2024
Towards Weakly Supervised Text-to-Audio Grounding
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
356
17
0
05 Jan 2024
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality,
  and a Future Outlook
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future OutlookIEEE Transactions on Intelligent Vehicles (TIV), 2024
Mingyu Liu
Ekim Yurtsever
Jonathan Fossaert
Xingcheng Zhou
Walter Zimmer
Yuning Cui
B. L. Žagar
Alois Knoll
472
82
0
02 Jan 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
  Multi-Modal Large Models
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
304
84
0
02 Jan 2024
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Kun Yan
Lei Ji
Zeyu Wang
Yuntao Wang
Nan Duan
Shuai Ma
262
22
0
22 Dec 2023
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Jianhua Wu
B. Gao
Jincheng Gao
Jianhao Yu
Hongqing Chu
...
Xun Gong
Yi Chang
H. E. Tseng
Hong Chen
Jie Chen
319
18
0
08 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Ding Wang
Yong-Jin Liu
Xiaoling Wang
Yu Qiao
414
36
0
07 Dec 2023
Multi-Modal Gaze Following in Conversational Scenarios
Multi-Modal Gaze Following in Conversational ScenariosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuqi Hou
Zhongqun Zhang
Nora Horanyi
Jaewon Moon
Yihua Cheng
Hyung Jin Chang
194
6
0
09 Nov 2023
Video Referring Expression Comprehension via Transformer with
  Content-conditioned Query
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
267
6
0
25 Oct 2023
Vision Language Models in Autonomous Driving: A Survey and Outlook
Vision Language Models in Autonomous Driving: A Survey and OutlookIEEE Transactions on Intelligent Vehicles (TIV), 2023
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
304
130
0
22 Oct 2023
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingIEEE International Conference on Robotics and Automation (ICRA), 2023
Tushar Choudhary
Vikrant Dewangan
Shivam Chandhok
Shubham Priyadarshan
Anushka Jain
A. K. Singh
Siddharth Srivastava
Krishna Murthy Jatavallabhula
K. M. Krishna
290
109
0
03 Oct 2023
Language Prompt for Autonomous Driving
Language Prompt for Autonomous DrivingAAAI Conference on Artificial Intelligence (AAAI), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
478
127
0
08 Sep 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language modelsInternational Conference on Learning Representations (ICLR), 2023
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
470
13
0
30 Jun 2023
Referring Multi-Object Tracking
Referring Multi-Object TrackingComputer Vision and Pattern Recognition (CVPR), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
240
118
0
06 Mar 2023
Video Referring Expression Comprehension via Transformer with
  Content-aware Query
Video Referring Expression Comprehension via Transformer with Content-aware Query
Ji Jiang
Meng Cao
Tengtao Song
Yuexian Zou
270
5
0
06 Oct 2022
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video
  Grounding
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video GroundingNeural Information Processing Systems (NeurIPS), 2022
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
245
48
0
27 Sep 2022
Correspondence Matters for Video Referring Expression Comprehension
Correspondence Matters for Video Referring Expression ComprehensionACM Multimedia (ACM MM), 2022
Meng Cao
Ji Jiang
Long Chen
Yuexian Zou
VOS
311
21
0
21 Jul 2022
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video
  Grounding
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video GroundingInternational Conference on Information Photonics (ICIP), 2022
Zeyu Xiong
Daizong Liu
Technology
89
8
0
02 Jul 2022
Where and What: Driver Attention-based Object Detection
Where and What: Driver Attention-based Object Detection
Yao Rong
Naemi-Rebecca Kassautzki
Wolfgang Fuhl
Enkelejda Kasneci
228
6
0
26 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific
  Human Gaze?
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Stephanie Brandl
Oliver Eberle
Jonas Pilot
Anders Søgaard
205
43
0
25 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
TubeDETR: Spatio-Temporal Video Grounding with TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
341
121
0
30 Mar 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language
  Spatial Video Grounding
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Meng Li
Tianbao Wang
Haoyu Zhang
Shengyu Zhang
Zhou Zhao
...
Wenming Tan
Jin Wang
Peng Wang
Shi Pu
Leilei Gan
292
46
0
15 Mar 2022
RadioTransformer: A Cascaded Global-Focal Transformer for Visual
  Attention-guided Disease Classification
RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-guided Disease ClassificationEuropean Conference on Computer Vision (ECCV), 2022
Moinak Bhattacharya
Shubham Jain
Prateek Prasanna
ViTMedIm
200
40
0
23 Feb 2022
Leveraging Human Selective Attention for Medical Image Analysis with
  Limited Training Data
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data
Yifei Huang
Xiaoxiao Li
Lijin Yang
Lin Gu
Yingying Zhu
Hirofumi Seo
Qiuming Meng
Tatsuya Harada
Yoichi Sato
MedIm
181
11
0
02 Dec 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language AcquisitionIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2021
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLMSSLGANDRL
213
2
0
20 Jul 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain
  Situations?
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?Engineering applications of artificial intelligence (EAAI), 2021
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
200
15
0
08 Jun 2021
Generating Image Descriptions via Sequential Cross-Modal Alignment
  Guided by Human Gaze
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze
Ece Takmaz
Sandro Pezzelle
Lisa Beinborn
Raquel Fernández
214
25
0
09 Nov 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
227
6
0
18 Sep 2020
Towards End-to-end Video-based Eye-Tracking
Towards End-to-end Video-based Eye-TrackingEuropean Conference on Computer Vision (ECCV), 2020
Seonwook Park
Emre Aksan
Xucong Zhang
Otmar Hilliges
174
93
0
26 Jul 2020
Visual Relation Grounding in Videos
Visual Relation Grounding in VideosEuropean Conference on Computer Vision (ECCV), 2020
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
262
45
0
17 Jul 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
  Sentences
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form SentencesComputer Vision and Pattern Recognition (CVPR), 2020
Zhu Zhang
Zhou Zhao
Yang Zhao
Qi. Wang
Huasheng Liu
Lianli Gao
253
148
0
19 Jan 2020
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention
  and Spatial Memory
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
A. Vasudevan
Ahmed K. Farahat
Chetan Gupta
LM&Ro
251
3
0
04 Oct 2019
Talk2Car: Taking Control of Your Self-Driving Car
Talk2Car: Taking Control of Your Self-Driving CarConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Luc Van Gool
Marie-Francine Moens
LM&Ro
194
166
0
24 Sep 2019
Searching for Ambiguous Objects in Videos using Relational Referring
  Expressions
Searching for Ambiguous Objects in Videos using Relational Referring ExpressionsBritish Machine Vision Conference (BMVC), 2019
Hazan Anayurt
Sezai Artun Ozyegin
Ulfet Cetin
Utku Aktaş
Sinan Kalkan
286
9
0
03 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
416
142
0
22 Jul 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in VideoAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Zhenfang Chen
Lin Ma
Tong Lu
Kwan-Yee K. Wong
268
111
0
06 Jun 2019
Learning Accurate, Comfortable and Human-like Driving
Learning Accurate, Comfortable and Human-like Driving
Simon Hecker
Dengxin Dai
Luc Van Gool
118
30
0
26 Mar 2019
Generating Easy-to-Understand Referring Expressions for Target
  Identifications
Generating Easy-to-Understand Referring Expressions for Target Identifications
Mikihiro Tanaka
Takayuki Itamochi
Kenichi Narioka
Ikuro Sato
Yoshitaka Ushiku
Tatsuya Harada
220
1
0
29 Nov 2018
TVQA: Localized, Compositional Video Question Answering
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
440
720
0
05 Sep 2018
Video Object Segmentation with Language Referring Expressions
Video Object Segmentation with Language Referring Expressions
Anna Khoreva
Anna Rohrbach
Bernt Schiele
VOS
261
242
0
21 Mar 2018
1
Page 1 of 1