ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.08865
  4. Cited By
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image
  Captioning

Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning

14 December 2023
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning"

9 / 9 papers shown
Title
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
Tianyi Shang
Zhenyu Li
Pengjie Xu
Jinwei Qiao
Gang Chen
Zihan Ruan
Weijun Hu
49
0
0
20 Feb 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
36
0
0
03 Jan 2025
Understanding and Constructing Latent Modality Structures in Multi-modal
  Representation Learning
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul M. Chilimbi
33
36
0
10 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
34
81
0
06 Mar 2023
Text-Only Training for Image Captioning using Noise-Injected CLIP
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
35
69
0
01 Nov 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
J. Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
109
35
0
25 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
51
244
0
14 Jul 2021
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense
  Generation
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Yiran Xing
Z. Shi
Zhao Meng
Gerhard Lakemeyer
Yunpu Ma
Roger Wattenhofer
VLM
57
40
0
02 Jan 2021
1