ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17251
  4. Cited By
Altogether: Image Captioning via Re-aligning Alt-text

Altogether: Image Captioning via Re-aligning Alt-text

31 December 2024
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
Christine Jou
Gargi Ghosh
Omer Levy
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
    DiffM
ArXivPDFHTML

Papers citing "Altogether: Image Captioning via Re-aligning Alt-text"

6 / 6 papers shown
Title
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Simon Ging
Sebastian Walter
Jelena Bratulić
Johannes Dienert
Hannah Bast
Thomas Brox
CLIP
17
0
0
05 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
56
2
0
01 Apr 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begum Demir
Ioannis Papoutsis
VLM
79
0
0
13 Feb 2025
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf
  Foundation Models for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
40
1
0
15 Nov 2024
1