ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.12066
  4. Cited By
Teaching CLIP to Count to Ten

Teaching CLIP to Count to Ten

23 February 2023
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Teaching CLIP to Count to Ten"

21 / 71 papers shown
Title
Visual Data-Type Understanding does not emerge from Scaling
  Vision-Language Models
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
24
8
0
12 Oct 2023
SYRAC: Synthesize, Rank, and Count
SYRAC: Synthesize, Rank, and Count
Adrian dÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
DiffM
28
1
0
02 Oct 2023
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class
  Class-agnostic Counting
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Michael A. Hobley
V. Prisacariu
11
3
0
09 Sep 2023
Divide & Bind Your Attention for Improved Generative Semantic Nursing
Divide & Bind Your Attention for Improved Generative Semantic Nursing
Yumeng Li
M. Keuper
Dan Zhang
Anna Khoreva
DiffM
23
47
0
20 Jul 2023
Counting Guidance for High Fidelity Text-to-Image Synthesis
Counting Guidance for High Fidelity Text-to-Image Synthesis
Wonjune Kang
Kevin Galim
H. Koo
Nam Ik Cho
DiffM
24
7
0
30 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
66
188
0
19 Jun 2023
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty
  Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Liang Gao
Weiming Shen
18
1
0
15 Jun 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
  Enhance Visio-Linguistic Compositional Understanding
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang
Rabiul Awal
Aishwarya Agrawal
CoGe
VLM
18
9
0
15 Jun 2023
Grounded Text-to-Image Synthesis with Attention Refocusing
Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung
Songwei Ge
Jia-Bin Huang
DiffM
18
104
0
08 Jun 2023
Open-world Text-specified Object Counting
Open-world Text-specified Object Counting
Niki Amini-Naieni
Kiana Amini-Naieni
Tengda Han
Andrew Zisserman
VLM
8
16
0
02 Jun 2023
LayoutGPT: Compositional Visual Planning and Generation with Large
  Language Models
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng
Wanrong Zhu
Tsu-jui Fu
Varun Jampani
Arjun Reddy Akula
Xuehai He
Sugato Basu
X. Wang
William Yang Wang
MLLM
20
160
0
24 May 2023
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
  Text-to-Image Generation by Selection
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
24
20
0
22 May 2023
Segment Any Anomaly without Training via Hybrid Prompt Regularization
Segment Any Anomaly without Training via Hybrid Prompt Regularization
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Zongwei Du
Liang Gao
Weiming Shen
VLM
26
69
0
18 May 2023
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
Ruixia Jiang
Lin Liu
Changan Chen
VLM
22
59
0
12 May 2023
Incorporating Structured Representations into Pretrained Vision &
  Language Models Using Scene Graphs
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
Roei Herzig
Alon Mendelson
Leonid Karlinsky
Assaf Arbelle
Rogerio Feris
Trevor Darrell
Amir Globerson
VLM
23
31
0
10 May 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
22
19
0
05 Apr 2023
Editing Implicit Assumptions in Text-to-Image Diffusion Models
Editing Implicit Assumptions in Text-to-Image Diffusion Models
Hadas Orgad
Bahjat Kawar
Yonatan Belinkov
DiffM
19
84
0
14 Mar 2023
Why is Winoground Hard? Investigating Failures in Visuolinguistic
  Compositionality
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality
Anuj Diwan
Layne Berry
Eunsol Choi
David F. Harwath
Kyle Mahowald
CoGe
101
41
0
01 Nov 2022
DALL-E 2 Fails to Reliably Capture Common Syntactic Processes
DALL-E 2 Fails to Reliably Capture Common Syntactic Processes
Evelina Leivada
Elliot Murphy
G. Marcus
133
37
0
23 Oct 2022
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Previous
12