Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.13563
Cited By
v1
v2 (latest)
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2022
27 December 2022
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (46★)
Papers citing
"Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning"
16 / 16 papers shown
Title
Testing chatbots on the creation of encoders for audio conditioned image generation
Jorge E. León
Miguel Carrasco
56
0
0
09 Sep 2025
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
63
1
0
06 Sep 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
211
0
0
11 Apr 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
177
1
0
22 Mar 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
329
7
0
10 Feb 2025
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang
Xi Chen
Xiaogang Xu
S. Ji
Yongxu Liu
Yujun Shen
Hengshuang Zhao
DiffM
242
0
0
21 Jan 2025
Development of Image Collection Method Using YOLO and Siamese Network
Chan Young Shin
Ah Hyun Lee
Jun Young Lee
Ji Min Lee
Soo Jin Park
47
0
0
16 Oct 2024
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao
Mahyar Najibi
Sachin Mehta
CLIP
DiffM
145
1
0
15 Oct 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
176
1
0
09 Aug 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
191
5
0
15 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Elisa Kreiss
245
2
0
10 Jul 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
285
7
0
30 May 2024
The Solution for the CVPR2024 NICE Image Captioning Challenge
Longfei Huang
Shupeng Zhong
Xiangyu Wu
Ruoxuan Li
115
0
0
19 Apr 2024
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
Bowen Huang
Yanwei Zheng
Chuanlin Lan
Xinpeng Zhao
Yifei Zou
Dongxiao Yu
199
0
0
23 Mar 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
Hasan Hammoud
Hani Itani
Fabio Pizzati
Juil Sock
Adel Bibi
Guohao Li
CLIP
VLM
320
51
0
02 Feb 2024
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
174
7
0
05 Sep 2023
1