Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.00754
Cited By
Dense-Captioning Events in Videos
2 May 2017
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dense-Captioning Events in Videos"
50 / 234 papers shown
Title
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
24
64
0
04 Sep 2022
Hierarchical Local-Global Transformer for Temporal Sentence Grounding
Xiang Fang
Daizong Liu
Pan Zhou
Zichuan Xu
Rui Li
19
28
0
31 Aug 2022
Partially Relevant Video Retrieval
Jianfeng Dong
Xianke Chen
Minsong Zhang
Xun Yang
Shujie Chen
Xirong Li
Xun Wang
17
39
0
26 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
28
0
0
16 Aug 2022
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
34
12
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
25
18
0
01 Aug 2022
Reducing the Vision and Language Bias for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Wei Hu
19
49
0
27 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
39
114
0
16 Jul 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
11
25
0
26 Jun 2022
Video Activity Localisation with Uncertainties in Temporal Boundary
Jiabo Huang
Hailin Jin
S. Gong
Yang Liu
24
23
0
26 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
36
131
0
18 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
31
43
0
17 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
28
0
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
66
527
0
13 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
110
0
07 Jun 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
29
156
0
03 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
46
189
0
03 Jun 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
24
68
0
19 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
126
62
0
17 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
28
6
0
12 May 2022
Video Moment Retrieval from Text Queries via Single Frame Annotation
Ran Cui
Tianwen Qian
Pai Peng
E. Daskalaki
Jingjing Chen
Xiao-Wei Guo
Huyang Sun
Yu-Gang Jiang
17
35
0
20 Apr 2022
Position-aware Location Regression Network for Temporal Video Grounding
Sunoh Kim
Kimin Yun
J. Choi
22
4
0
12 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
43
39
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
Searching for fingerspelled content in American Sign Language
Bowen Shi
D. Brentari
G. Shakhnarovich
Karen Livescu
24
5
0
24 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Fei Wu
Yi Yang
Yueting Zhuang
Qing Guo
36
73
0
24 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
25
19
0
23 Mar 2022
All in One: Exploring Unified Video-Language Pre-training
Alex Jinpeng Wang
Yixiao Ge
Rui Yan
Yuying Ge
Xudong Lin
Guanyu Cai
Jianping Wu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
33
200
0
14 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
33
10
0
11 Mar 2022
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Xiaohan Lan
Yitian Yuan
Xin Wang
Long Chen
Zhi Wang
Lin Ma
Wenwu Zhu
CML
24
15
0
10 Mar 2022
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding
Shentong Mo
Daizong Liu
Wei Hu
SSL
21
6
0
08 Mar 2022
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding
Daizong Liu
Xiang Fang
Wei Hu
Pan Zhou
23
37
0
06 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
90
730
0
28 Jan 2022
An Integrated Approach for Video Captioning and Applications
Soheyla Amirian
T. Taha
Khaled Rasheed
H. Arabnia
26
1
0
23 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
27
84
0
23 Dec 2021
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
M. Chen
Junxin Li
Ying Shen
Min Yang
3DV
27
43
0
22 Dec 2021
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
Valter Estevam
Rayson Laroca
David Menotti
Hélio Pedrini
33
13
0
18 Dec 2021
KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation
Xin Liu
Dayiheng Liu
Baosong Yang
Haibo Zhang
Junwei Ding
Wenqing Yao
Weihua Luo
Haiying Zhang
Jinsong Su
LRM
32
8
0
15 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
30
23
0
02 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
22
6
0
02 Dec 2021
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
30
20
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
44
95
0
01 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Luu Anh Tuan
Lijuan Wang
Zicheng Liu
VLM
45
216
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
29
189
0
19 Nov 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
Towards Debiasing Temporal Sentence Grounding in Video
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
48
16
0
08 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
16
29
0
01 Nov 2021
Previous
1
2
3
4
5
Next