Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.06587
Cited By
Grounded Video Description
17 December 2018
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounded Video Description"
19 / 19 papers shown
Title
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
52
0
0
13 Mar 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
121
2
0
14 Jan 2025
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
F. Khan
Salman Khan
MLLM
VGen
VLM
42
6
0
07 Nov 2024
Large Language Models are Few-Shot Health Learners
Xin Liu
Daniel J. McDuff
G. Kovács
I. Galatzer-Levy
Jacob Sunshine
Jiening Zhan
M. Poh
Shun Liao
P. Achille
Shwetak N. Patel
LM&MA
AI4MH
32
96
0
24 May 2023
Event and Entity Extraction from Generated Video Captions
Johannes Scherer
A. Scherp
Deepayan Bhowmik
19
0
0
05 Nov 2022
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
19
46
0
19 Oct 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
23
20
0
01 Dec 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
28
4
0
19 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
31
33
0
17 Nov 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
23
1
0
10 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
25
179
0
17 Aug 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee Kenneth Wong
J. Tenenbaum
Chuang Gan
VGen
31
92
0
30 Mar 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya
Michael S. Ryoo
AI4TS
33
38
0
01 Mar 2021
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Mohit Bansal
14
186
0
11 May 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
17
235
0
31 Mar 2020
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
20
227
0
25 Apr 2019
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
189
432
0
27 Mar 2018
1