ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.07464
  4. Cited By
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
v1v2 (latest)

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

IEEE International Conference on Computer Vision (ICCV), 2023
18 January 2023
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
    VLMCLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPTER: Looking at the Bigger Picture in Scene Text Recognition"

13 / 13 papers shown
Title
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
Anik De
A. S. Penamakuri
Rajeev Yadav
Aditya Rathore
Harshiv Shah
Devesh Sharma
Sagar Agarwal
Pravin Kumar
Anand Mishra
108
0
0
28 Nov 2025
Towards General Urban Monitoring with Vision-Language Models: A Review, Evaluation, and a Research Agenda
Towards General Urban Monitoring with Vision-Language Models: A Review, Evaluation, and a Research Agenda
André Torneiro
Diogo Monteiro
Paulo Novais
Pedro Rangel Henriques
Nuno F. Rodrigues
129
1
0
14 Oct 2025
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
Xiahan Yang
Hui Zheng
VLM
90
1
0
02 Aug 2025
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Liang Yin
Xudong Xie
Zhang Li
Xiang Bai
Yuliang Liu
LRM
282
0
0
12 Jun 2025
BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQAInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Zhengyang Ji
Shang Gao
Li Liu
Yifan Jia
Yutao Yue
213
1
0
04 Mar 2025
DocVLM: Make Your VLM an Efficient Reader
DocVLM: Make Your VLM an Efficient ReaderComputer Vision and Pattern Recognition (CVPR), 2024
Mor Shpigel Nacson
Aviad Aberdam
Roy Ganz
Elad Ben Avraham
Alona Golts
Yair Kittenplon
Shai Mazor
Ron Litman
VLM
593
0
0
11 Dec 2024
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language
  Models
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Jonathan Fhima
Elad Ben Avraham
Oren Nuriel
Yair Kittenplon
Roy Ganz
Aviad Aberdam
Ron Litman
VLM
236
1
0
07 Nov 2024
Question Aware Vision Transformer for Multimodal Reasoning
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz
Yair Kittenplon
Aviad Aberdam
Elad Ben Avraham
Oren Nuriel
Shai Mazor
Ron Litman
276
35
0
08 Feb 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
209
21
0
07 Jan 2024
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text
  Recognition
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionACM Multimedia (ACM MM), 2023
Zixiao Wang
Hongtao Xie
Yuxin Wang
Jianjun Xu
Boqiang Zhang
Yongdong Zhang
314
26
0
08 Oct 2023
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards
  Enhancing Text Spotting Performance
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting PerformanceIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Alloy Das
Sanket Biswas
Ayan Banerjee
Josep Lladós
Umapada Pal
Saumik Bhattacharya
286
4
0
02 Oct 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
348
51
0
28 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained
  Vision-Language Model
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language ModelIEEE Transactions on Image Processing (IEEE TIP), 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIPVLM
347
44
0
23 May 2023
1