Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05739
Cited By
Learning to Retrieve Videos by Asking Questions
11 May 2022
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Retrieve Videos by Asking Questions"
16 / 16 papers shown
Title
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Chang Zong
Bin Li
Shoujun Zhou
Jian Wan
Lei Zhang
43
0
0
22 Apr 2025
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
Yiding Lu
Mouxing Yang
Dezhong Peng
Peng Hu
Yijie Lin
Xi Peng
41
0
0
14 Apr 2025
Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation
Sven Jacobs
Steffen Jaschke
30
3
0
05 May 2024
Simple Baselines for Interactive Video Retrieval with Questions and Answers
Kaiqu Liang
Samuel Albanie
16
0
0
21 Aug 2023
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
20
7
0
27 May 2023
Dialogue-to-Video Retrieval
Chenyang Lyu
Manh-Duy Nguyen
Van-Tu Ninh
Liting Zhou
C. Gurrin
Jennifer Foster
18
0
0
23 Mar 2023
Acquisition Conditioned Oracle for Nongreedy Active Feature Acquisition
M. Valancius
M. Lennon
Junier Oliva
15
0
0
27 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei-Neng Chen
Yanming Guo
Yu Liu
13
14
0
24 Feb 2023
Is Multimodal Vision Supervision Beneficial to Language?
Avinash Madasu
Vasudev Lal
19
4
0
10 Feb 2023
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
22
3
0
24 Aug 2022
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
Guanyu Cai
Jun Zhang
Xinyang Jiang
Yifei Gong
Lianghua He
Fufu Yu
Pai Peng
Xiaowei Guo
Feiyue Huang
Xing Sun
13
10
0
02 Mar 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1