Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.12066
Cited By
Teaching CLIP to Count to Ten
23 February 2023
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Teaching CLIP to Count to Ten"
50 / 71 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
21
0
0
13 May 2025
Text-to-Image Alignment in Denoising-Based Models through Step Selection
P. Grimal
Hervé Le Borgne
Olivier Ferret
DiffM
EGVM
48
0
0
24 Apr 2025
ProgRoCC: A Progressive Approach to Rough Crowd Counting
Shengqin Jiang
Linfei Li
Haokui Zhang
Qingshan Liu
Amin Beheshti
Jian Yang
Anton van den Hengel
Quan Z. Sheng
Yuankai Qi
23
0
0
18 Apr 2025
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
I-Sheng Fang
Jun-Cheng Chen
LRM
VLM
28
0
0
14 Apr 2025
Human-like compositional learning of visually-grounded concepts using synthetic environments
Zijun Lin
M Ganesh Kumar
Cheston Tan
OCL
CoGe
70
0
0
09 Apr 2025
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
Xuyang Guo
Zekai Huang
Jiayan Huo
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
ALM
VGen
59
2
0
05 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
44
0
0
03 Apr 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
82
24
0
25 Mar 2025
On the Limitations of Vision-Language Models in Understanding Image Transforms
Ahmad Mustafa Anis
Hasnain Ali
Saquib Sarfraz
VLM
Presented at
ResearchTrend Connect | VLM
on
28 Mar 2025
133
0
0
12 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
50
0
0
10 Mar 2025
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
Wenxin Ma
Xu Zhang
Qingsong Yao
Fenghe Tang
Chenxu Wu
Y. Li
Rui Yan
Zihang Jiang
S. Kevin Zhou
VLM
52
0
0
09 Mar 2025
Object-centric Binding in Contrastive Language-Image Pretraining
Rim Assouel
Pietro Astolfi
Florian Bordes
M. Drozdzal
Adriana Romero Soriano
OCL
VLM
CoGe
102
0
0
19 Feb 2025
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu
Yuheng Ding
Bingxuan Li
Pan Lu
Da Yin
Kai-Wei Chang
Nanyun Peng
LRM
97
3
0
03 Dec 2024
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura
Ahmed Heakl
Omkar Thawakar
Ali Alharthi
Ines Riahi
Abduljalil Saif
Jorma T. Laaksonen
F. Khan
Salman Khan
Rao Muhammad Anwer
34
0
0
24 Oct 2024
TopoDiffusionNet: A Topology-aware Diffusion Model
Saumya Gupta
Dimitris Samaras
C. L. P. Chen
DiffM
18
4
0
22 Oct 2024
Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Mazda Moayeri
Vidhisha Balachandran
Varun Chandrasekaran
Safoora Yousefi
Thomas Fel
S. Feizi
Besmira Nushi
Neel Joshi
Vibhav Vineet
13
2
0
17 Oct 2024
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
Tianwei Xiong
Yuqing Wang
Daquan Zhou
Zhijie Lin
Jiashi Feng
Xihui Liu
VGen
15
7
0
14 Oct 2024
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Ayush Singh
Mansi Gupta
Shivank Garg
Abhinav Kumar
Vansh Agrawal
ReLM
LRM
24
0
0
08 Oct 2024
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Pei Liu
Luping Ji
Jiaxiang Gou
Bo Fu
Mao Ye
21
2
0
14 Sep 2024
Iterative Object Count Optimization for Text-to-image Diffusion Models
Oz Zafar
Lior Wolf
Idan Schwartz
VLM
19
3
0
21 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
38
10
0
09 Aug 2024
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao Du
Qiang Zhai
Weihang Dai
X. Li
35
8
0
07 Aug 2024
CountGD: Multi-Modal Open-World Counting
Niki Amini-Naieni
Tengda Han
Andrew Zisserman
ObjD
51
7
0
05 Jul 2024
Evaluating Numerical Reasoning in Text-to-Image Models
Ivana Kajić
Olivia Wiles
Isabela Albuquerque
Matthias Bauer
Su Wang
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
ReLM
71
0
0
20 Jun 2024
Neural Approximate Mirror Maps for Constrained Diffusion Models
Berthy T. Feng
Ricardo Baptista
Katherine L. Bouman
MedIm
DiffM
35
3
0
18 Jun 2024
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin
Yoad Tewel
Hilit Segev
Eran Hirsch
Royi Rassin
Gal Chechik
24
6
0
14 Jun 2024
Nomic Embed Vision: Expanding the Latent Space
Zach Nussbaum
Brandon Duderstadt
Andriy Mulyar
VLM
30
5
0
06 Jun 2024
CountCLIP -- [Re] Teaching CLIP to Count to Ten
Harshvardhan Mestha
Tejas Agrawal
Karan Bania
Shreyas V
Yash Bhisikar
VLM
22
1
0
05 Jun 2024
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Yiitan Yuan
Zhuo Chen
Xubo Liu
Haohe Liu
Xuenan Xu
Dongya Jia
Yuanzhe Chen
Mark D. Plumbley
Wenwu Wang
CLIP
VLM
35
9
0
27 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
113
13
0
25 Apr 2024
Is CLIP the main roadblock for fine-grained open-world perception?
Lorenzo Bianchi
F. Carrara
Nicola Messina
Fabrizio Falchi
VLM
27
4
0
04 Apr 2024
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Omer Dahary
Or Patashnik
Kfir Aberman
Daniel Cohen-Or
DiffM
19
27
0
25 Mar 2024
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
Zizhao Hu
Shaochong Jia
Mohammad Rostami
25
1
0
25 Mar 2024
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu
Weicong Liang
Zhanhao Liang
Chong Luo
Ji Li
Gao Huang
Yuhui Yuan
DiffM
64
23
0
14 Mar 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Jialu Li
Jaemin Cho
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
MoMe
DiffM
34
8
0
11 Mar 2024
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li
Yuke Li
Zhekai Du
Fengling Li
Ke Lu
Jingjing Li
VLM
39
4
0
11 Mar 2024
Naming, Describing, and Quantifying Visual Objects in Humans and LLMs
A. Testoni
Juell Sprott
Sandro Pezzelle
28
1
0
11 Mar 2024
AFreeCA: Annotation-Free Counting for All
Adrian dÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
DiffM
32
0
0
07 Mar 2024
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
Hao-Ran Cheng
Erjia Xiao
Jindong Gu
Le Yang
Jinhao Duan
Jize Zhang
Jiahang Cao
Kaidi Xu
Renjing Xu
24
6
0
29 Feb 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Jianrui Zhang
Mu Cai
Tengyang Xie
Yong Jae Lee
LRM
32
18
0
20 Feb 2024
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Chong Zeng
Yue Dong
Pieter Peers
Youkang Kong
Hongzhi Wu
Xin Tong
22
25
0
19 Feb 2024
Improving fine-grained understanding in image-text pre-training
Ioana Bica
Anastasija Ilić
Matthias Bauer
Goker Erdogan
Matko Bovsnjak
...
A. Gritsenko
Matthias Minderer
Charles Blundell
Razvan Pascanu
Jovana Mitrović
VLM
23
21
0
18 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
21
3
0
15 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
21
9
0
03 Jan 2024
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi
Zhengzhong Tu
Keren Ye
M. Delbracio
P. Milanfar
Qifeng Chen
Hossein Talebi
DiffM
16
11
0
18 Dec 2023
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral
Enis Simsar
Federico Tombari
Pinar Yanardag
DiffM
VLM
20
26
0
11 Dec 2023
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma
Varun Jampani
Yuanzhen Li
Xuhui Jia
Dmitry Lagun
Frédo Durand
William T. Freeman
Mark J. Matthews
DiffM
31
21
0
05 Dec 2023
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Jiao Sun
Deqing Fu
Yushi Hu
Su Wang
Royi Rassin
...
Dana Alon
Charles Herrmann
Sjoerd van Steenkiste
Ranjay Krishna
Cyrus Rashtchian
EGVM
20
39
0
29 Nov 2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Jaemin Cho
Yushi Hu
Roopal Garg
Peter Anderson
Ranjay Krishna
Jason Baldridge
Mohit Bansal
Jordi Pont-Tuset
Su Wang
EGVM
22
65
0
27 Oct 2023
Semantic Generative Augmentations for Few-Shot Counting
Perla Doubinsky
Nicolas Audebert
M. Crucianu
Hervé Le Borgne
VLM
DiffM
11
4
0
26 Oct 2023
1
2
Next