ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01582
  4. Cited By
Object Referring in Videos with Language and Human Gaze
v1v2 (latest)

Object Referring in Videos with Language and Human Gaze

4 January 2018
A. Vasudevan
Dengxin Dai
Luc Van Gool
    VOS
ArXiv (abs)PDFHTML

Papers citing "Object Referring in Videos with Language and Human Gaze"

50 / 50 papers shown
Title
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
Anna Deichler
Jonas Beskow
VGen
136
0
0
26 Oct 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba
Kunyu Peng
Di Wen
Jia Fu
Jiamin Wu
Kailun Yang
...
Yufan Chen
Yuqian Fu
D. Paudel
Luc Van Gool
Rainer Stiefelhagen
97
0
0
18 Oct 2025
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Multimodal Human-Intent Modeling for Contextual Robot-to-Human Handovers of Arbitrary Objects
Lucas Chen
Guna Avula
Hanwen Ren
Zixing Wang
A. H. Qureshi
95
1
0
05 Aug 2025
RefAV: Towards Planning-Centric Scenario Mining
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
351
6
0
27 May 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
Tian Jin
Guang Chen
Yanfeng Wang
Yujiao Shi
319
2
0
18 Mar 2025
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
170
3
0
17 Oct 2024
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour
David Harrison
Maxwell Horton
Jeffrey Marker
Houman Bedayat
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
Saman Naderiparizi
MQ
323
7
0
14 Oct 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human AttentionEuropean Conference on Computer Vision (ECCV), 2024
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
389
3
0
28 Jul 2024
Bootstrapping Referring Multi-Object Tracking
Bootstrapping Referring Multi-Object Tracking
Yani Zhang
Dongming Wu
Wencheng Han
Xingping Dong
303
19
0
07 Jun 2024
MLS-Track: Multilevel Semantic Interaction in RMOT
MLS-Track: Multilevel Semantic Interaction in RMOT
Zeliang Ma
Yang Song
Zhe Cui
Zhicheng Zhao
Fei Su
Delong Liu
Jingyu Wang
183
7
0
18 Apr 2024
Spatio-Temporal Attention and Gaussian Processes for Personalized Video
  Gaze Estimation
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
Swati Jindal
Mohit Yadav
Roberto Manduchi
152
11
0
08 Apr 2024
Towards Weakly Supervised Text-to-Audio Grounding
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
289
17
0
05 Jan 2024
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality,
  and a Future Outlook
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future OutlookIEEE Transactions on Intelligent Vehicles (TIV), 2024
Mingyu Liu
Ekim Yurtsever
Jonathan Fossaert
Xingcheng Zhou
Walter Zimmer
Yuning Cui
B. L. Žagar
Alois Knoll
424
79
0
02 Jan 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
  Multi-Modal Large Models
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
269
82
0
02 Jan 2024
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Kun Yan
Lei Ji
Zeyu Wang
Yuntao Wang
Nan Duan
Shuai Ma
217
20
0
22 Dec 2023
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Jianhua Wu
B. Gao
Jincheng Gao
Jianhao Yu
Hongqing Chu
...
Xun Gong
Yi Chang
H. E. Tseng
Hong Chen
Jie Chen
293
17
0
08 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Ding Wang
Yong-Jin Liu
Xiaoling Wang
Yu Qiao
355
35
0
07 Dec 2023
Multi-Modal Gaze Following in Conversational Scenarios
Multi-Modal Gaze Following in Conversational ScenariosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuqi Hou
Zhongqun Zhang
Nora Horanyi
Jaewon Moon
Yihua Cheng
Hyung Jin Chang
153
5
0
09 Nov 2023
Video Referring Expression Comprehension via Transformer with
  Content-conditioned Query
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
235
6
0
25 Oct 2023
Vision Language Models in Autonomous Driving: A Survey and Outlook
Vision Language Models in Autonomous Driving: A Survey and OutlookIEEE Transactions on Intelligent Vehicles (TIV), 2023
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
247
126
0
22 Oct 2023
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingIEEE International Conference on Robotics and Automation (ICRA), 2023
Tushar Choudhary
Vikrant Dewangan
Shivam Chandhok
Shubham Priyadarshan
Anushka Jain
A. K. Singh
Siddharth Srivastava
Krishna Murthy Jatavallabhula
K. M. Krishna
237
102
0
03 Oct 2023
Language Prompt for Autonomous Driving
Language Prompt for Autonomous DrivingAAAI Conference on Artificial Intelligence (AAAI), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
397
121
0
08 Sep 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language modelsInternational Conference on Learning Representations (ICLR), 2023
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
409
13
0
30 Jun 2023
Referring Multi-Object Tracking
Referring Multi-Object TrackingComputer Vision and Pattern Recognition (CVPR), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
202
113
0
06 Mar 2023
Video Referring Expression Comprehension via Transformer with
  Content-aware Query
Video Referring Expression Comprehension via Transformer with Content-aware Query
Ji Jiang
Meng Cao
Tengtao Song
Yuexian Zou
246
5
0
06 Oct 2022
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video
  Grounding
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video GroundingNeural Information Processing Systems (NeurIPS), 2022
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
204
47
0
27 Sep 2022
Correspondence Matters for Video Referring Expression Comprehension
Correspondence Matters for Video Referring Expression ComprehensionACM Multimedia (ACM MM), 2022
Meng Cao
Ji Jiang
Long Chen
Yuexian Zou
VOS
261
21
0
21 Jul 2022
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video
  Grounding
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video GroundingInternational Conference on Information Photonics (ICIP), 2022
Zeyu Xiong
Daizong Liu
Technology
77
8
0
02 Jul 2022
Where and What: Driver Attention-based Object Detection
Where and What: Driver Attention-based Object Detection
Yao Rong
Naemi-Rebecca Kassautzki
Wolfgang Fuhl
Enkelejda Kasneci
200
6
0
26 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific
  Human Gaze?
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Stephanie Brandl
Oliver Eberle
Jonas Pilot
Anders Søgaard
178
42
0
25 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
TubeDETR: Spatio-Temporal Video Grounding with TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
280
120
0
30 Mar 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language
  Spatial Video Grounding
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Meng Li
Tianbao Wang
Haoyu Zhang
Shengyu Zhang
Zhou Zhao
...
Wenming Tan
Jin Wang
Peng Wang
Shi Pu
Leilei Gan
232
46
0
15 Mar 2022
RadioTransformer: A Cascaded Global-Focal Transformer for Visual
  Attention-guided Disease Classification
RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-guided Disease ClassificationEuropean Conference on Computer Vision (ECCV), 2022
Moinak Bhattacharya
Shubham Jain
Prateek Prasanna
ViTMedIm
143
39
0
23 Feb 2022
Leveraging Human Selective Attention for Medical Image Analysis with
  Limited Training Data
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data
Yifei Huang
Xiaoxiao Li
Lijin Yang
Lin Gu
Yingying Zhu
Hirofumi Seo
Qiuming Meng
Tatsuya Harada
Yoichi Sato
MedIm
161
11
0
02 Dec 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language AcquisitionIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2021
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLMSSLGANDRL
190
2
0
20 Jul 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain
  Situations?
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?Engineering applications of artificial intelligence (EAAI), 2021
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
161
15
0
08 Jun 2021
Generating Image Descriptions via Sequential Cross-Modal Alignment
  Guided by Human Gaze
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze
Ece Takmaz
Sandro Pezzelle
Lisa Beinborn
Raquel Fernández
204
25
0
09 Nov 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
203
6
0
18 Sep 2020
Towards End-to-end Video-based Eye-Tracking
Towards End-to-end Video-based Eye-TrackingEuropean Conference on Computer Vision (ECCV), 2020
Seonwook Park
Emre Aksan
Xucong Zhang
Otmar Hilliges
147
90
0
26 Jul 2020
Visual Relation Grounding in Videos
Visual Relation Grounding in VideosEuropean Conference on Computer Vision (ECCV), 2020
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
226
45
0
17 Jul 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
  Sentences
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form SentencesComputer Vision and Pattern Recognition (CVPR), 2020
Zhu Zhang
Zhou Zhao
Yang Zhao
Qi. Wang
Huasheng Liu
Lianli Gao
228
144
0
19 Jan 2020
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention
  and Spatial Memory
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
A. Vasudevan
Ahmed K. Farahat
Chetan Gupta
LM&Ro
195
3
0
04 Oct 2019
Talk2Car: Taking Control of Your Self-Driving Car
Talk2Car: Taking Control of Your Self-Driving CarConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Luc Van Gool
Marie-Francine Moens
LM&Ro
165
163
0
24 Sep 2019
Searching for Ambiguous Objects in Videos using Relational Referring
  Expressions
Searching for Ambiguous Objects in Videos using Relational Referring ExpressionsBritish Machine Vision Conference (BMVC), 2019
Hazan Anayurt
Sezai Artun Ozyegin
Ulfet Cetin
Utku Aktaş
Sinan Kalkan
266
9
0
03 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
376
141
0
22 Jul 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in VideoAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Zhenfang Chen
Lin Ma
Tong Lu
Kwan-Yee K. Wong
226
110
0
06 Jun 2019
Learning Accurate, Comfortable and Human-like Driving
Learning Accurate, Comfortable and Human-like Driving
Simon Hecker
Dengxin Dai
Luc Van Gool
105
30
0
26 Mar 2019
Generating Easy-to-Understand Referring Expressions for Target
  Identifications
Generating Easy-to-Understand Referring Expressions for Target Identifications
Mikihiro Tanaka
Takayuki Itamochi
Kenichi Narioka
Ikuro Sato
Yoshitaka Ushiku
Tatsuya Harada
195
1
0
29 Nov 2018
TVQA: Localized, Compositional Video Question Answering
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
412
714
0
05 Sep 2018
Video Object Segmentation with Language Referring Expressions
Video Object Segmentation with Language Referring Expressions
Anna Khoreva
Anna Rohrbach
Bernt Schiele
VOS
224
238
0
21 Mar 2018
1