Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.17251
Cited By
Altogether: Image Captioning via Re-aligning Alt-text
31 December 2024
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
Christine Jou
Gargi Ghosh
Omer Levy
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Altogether: Image Captioning via Re-aligning Alt-text"
6 / 6 papers shown
Title
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Simon Ging
Sebastian Walter
Jelena Bratulić
Johannes Dienert
Hannah Bast
Thomas Brox
CLIP
10
0
0
05 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
100
0
0
17 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
56
2
0
01 Apr 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begum Demir
Ioannis Papoutsis
VLM
79
0
0
13 Feb 2025
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
37
0
0
15 Nov 2024
1