ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.11562
  4. Cited By
Scheduled Sampling in Vision-Language Pretraining with Decoupled
  Encoder-Decoder Network

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

27 January 2021
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
    VLM
ArXivPDFHTML

Papers citing "Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network"

10 / 10 papers shown
Title
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion
  Synthesis System
NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System
Weiqiang Wang
Xuefei Zhe
Huan Chen
Di Kang
Tingguang Li
Ruizhi Chen
Linchao Bao
39
5
0
27 Sep 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
124
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
Exploring Structure-aware Transformer over Interaction Proposals for
  Human-Object Interaction Detection
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection
Y. Zhang
Yingwei Pan
Ting Yao
Rui Huang
Tao Mei
C. Chen
ViT
21
68
0
13 Jun 2022
Vision-and-Language Pretrained Models: A Survey
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
16
63
0
15 Apr 2022
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
27
149
0
13 Oct 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
16
31
0
18 Aug 2021
Understanding Chinese Video and Language via Contrastive Multimodal
  Pre-Training
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
28
41
0
19 Apr 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
927
0
24 Sep 2019
1