Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.11566
Cited By
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2020
26 February 2020
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Object Relational Graph with Teacher-Recommended Learning for Video Captioning"
50 / 116 papers shown
Title
Respecting Transfer Gap in Knowledge Distillation
Neural Information Processing Systems (NeurIPS), 2022
Yulei Niu
Long Chen
Chan Zhou
Hanwang Zhang
146
27
0
23 Oct 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
154
15
0
10 Oct 2022
Thinking Hallucination for Video Captioning
Asian Conference on Computer Vision (ACCV), 2022
Nasib Ullah
Partha Pratim Mohanta
VLM
115
9
0
28 Sep 2022
A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Chaoqi Chen
Yushuang Wu
Qiyuan Dai
Hong-Yu Zhou
Mutian Xu
Sibei Yang
Xiaoguang Han
Yizhou Yu
ViT
MedIm
AI4CE
227
112
0
27 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
International Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
220
4
0
15 Sep 2022
Diverse Video Captioning by Adaptive Spatio-temporal Attention
German Conference on Pattern Recognition (GCPR), 2022
Zohreh Ghaderi
Leonard Salewski
Hendrik P. A. Lensch
92
11
0
19 Aug 2022
Sports Video Analysis on Large-Scale Data
European Conference on Computer Vision (ECCV), 2022
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
86
19
0
09 Aug 2022
Rethinking Data Augmentation for Robust Visual Question Answering
European Conference on Computer Vision (ECCV), 2022
Long Chen
Yuhang Zheng
Jun Xiao
OOD
123
51
0
18 Jul 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Neural Information Processing Systems (NeurIPS), 2022
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
180
80
0
09 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
334
664
0
27 May 2022
A Survey on Long-Tailed Visual Recognition
International Journal of Computer Vision (IJCV), 2022
Lu Yang
He Jiang
Q. Song
Jun Guo
168
153
0
27 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
173
63
0
22 May 2022
Support-set based Multi-modal Representation Enhancement for Video Captioning
IEEE International Conference on Multimedia and Expo (ICME), 2022
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
118
4
0
19 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
101
6
0
12 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
114
2
0
28 Apr 2022
Self-Supervised Learning of Object Parts for Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
A. Ziegler
Yuki M. Asano
SSL
OCL
210
121
0
27 Apr 2022
Video Captioning: a comparative review of where we are and which could be the route
Computer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
140
14
0
12 Apr 2022
Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Huizhong Deng
Tong Zhang
Yuchao Dai
Jiawei Shi
Yiran Zhong
Hongdong Li
107
10
0
10 Apr 2022
Learning Audio-Video Modalities from Image Captions
European Conference on Computer Vision (ECCV), 2022
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
130
94
0
01 Apr 2022
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang
Yuxin Chen
Zongyang Ma
Chen Ma
Chunfen Yuan
Bing Li
Ying Shan
Weiming Hu
VGen
96
9
0
31 Mar 2022
Visual Abductive Reasoning
Computer Vision and Pattern Recognition (CVPR), 2022
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
120
45
0
26 Mar 2022
ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation
IEEE Access (IEEE Access), 2022
Khoa T. Vo
Kashu Yamazaki
Sang Truong
M. Tran
Akihiro Sugimoto
Ngan Le
EgoV
111
11
0
16 Mar 2022
RCL: Recurrent Continuous Localization for Temporal Action Detection
Computer Vision and Pattern Recognition (CVPR), 2022
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
ObjD
93
43
0
14 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
125
6
0
12 Mar 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2022
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
176
183
0
20 Jan 2022
Cross-modal Contrastive Distillation for Instructional Activity Anticipation
International Conference on Pattern Recognition (ICPR), 2022
Zhengyuan Yang
Jingen Liu
Jing-ling Huang
Xiaodong He
Tao Mei
Chenliang Xu
Jiebo Luo
98
6
0
18 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Computer Vision and Pattern Recognition (CVPR), 2021
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
138
9
0
11 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
International Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
117
2
0
28 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
104
45
0
14 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
188
147
0
02 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
96
23
0
30 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
182
286
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
174
86
0
24 Nov 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
137
4
0
19 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
154
40
0
17 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks
Computer Vision and Image Understanding (CVIU), 2021
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
183
7
0
14 Nov 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
165
167
0
13 Oct 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
ACM Multimedia (ACM MM), 2021
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
144
14
0
07 Sep 2021
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
Yucheng Zhao
Guangting Wang
Chong Luo
Wenjun Zeng
Zhengjun Zha
ISeg
SSL
132
53
0
18 Aug 2021
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Heliang Zheng
Huan Yang
Jianlong Fu
Zhengjun Zha
Jiebo Luo
118
57
0
18 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
IEEE Transactions on Image Processing (TIP), 2021
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
127
8
0
14 Aug 2021
Joint Inductive and Transductive Learning for Video Object Segmentation
IEEE International Conference on Computer Vision (ICCV), 2021
Yunyao Mao
Ning Wang
Wen-gang Zhou
Houqiang Li
VOS
177
103
0
08 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
ACM Multimedia (ACM MM), 2021
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
153
32
0
08 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning
Findings (Findings), 2021
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
123
36
0
05 Aug 2021
Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers
ACM Multimedia (ACM MM), 2021
Wen Wang
Yang Cao
Jing Zhang
Fengxiang He
Zhengjun Zha
Yonggang Wen
Dacheng Tao
ViT
158
108
0
27 Jul 2021
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
117
2
0
25 Jul 2021
Disentangle Your Dense Object Detector
Zehui Chen
Chenhongyi Yang
Qiaofei Li
Feng Zhao
Zhengjun Zha
Feng Wu
3DV
174
182
0
07 Jul 2021
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval
Giorgos Kordopatis-Zilos
Christos Tzelepis
Symeon Papadopoulos
I. Kompatsiaris
Ioannis Patras
138
41
0
24 Jun 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Computer Vision and Pattern Recognition (CVPR), 2021
Yuqing Song
Shizhe Chen
Qin Jin
103
40
0
30 May 2021
TransVG: End-to-End Visual Grounding with Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
238
416
0
17 Apr 2021
Previous
1
2
3
Next