ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.02375
  4. Cited By
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

5 July 2020
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
ArXivPDFHTML

Papers citing "Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training"

11 / 11 papers shown
Title
3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
  2D Diffusion Models
3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models
Haibo Yang
Yang Chen
Yingwei Pan
Ting Yao
Zhineng Chen
Tao Mei
19
19
0
09 Nov 2023
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
Ye Tian
Meng Yang
Lanshan Zhang
Zhizhen Zhang
Yang Liu
Xiao-Zhu Xie
Xirong Que
Wendong Wang
22
7
0
09 Aug 2023
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
  Learning
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning
Ting Yao
Yingwei Pan
Yehao Li
Chong-Wah Ngo
Tao Mei
ViT
146
136
0
11 Jul 2022
Exploring Structure-aware Transformer over Interaction Proposals for
  Human-Object Interaction Detection
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection
Y. Zhang
Yingwei Pan
Ting Yao
Rui Huang
Tao Mei
C. Chen
ViT
21
68
0
13 Jun 2022
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
27
149
0
13 Oct 2021
Trustworthy AI: From Principles to Practices
Trustworthy AI: From Principles to Practices
Bo-wen Li
Peng Qi
Bo Liu
Shuai Di
Jingen Liu
Jiquan Pei
Jinfeng Yi
Bowen Zhou
117
354
0
04 Oct 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
61
44
0
21 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
25
47
0
16 Sep 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
11
31
0
18 Aug 2021
Understanding Chinese Video and Language via Contrastive Multimodal
  Pre-Training
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
28
41
0
19 Apr 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
926
0
24 Sep 2019
1