ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.15732
  4. Cited By
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

27 November 2023
Wenhao Wu
Huanjin Yao
Mengxi Zhang
Yuxin Song
Wanli Ouyang
Jingdong Wang
    VLM
ArXivPDFHTML

Papers citing "GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?"

28 / 28 papers shown
Title
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
75
0
0
13 Mar 2025
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification
Xiangyan Qu
Jing Yu
Jiamin Zhuang
Gaopeng Gou
Gang Xiong
Qi Wu
VLM
43
0
0
10 Mar 2025
Do large language vision models understand 3D shapes?
Do large language vision models understand 3D shapes?
Sagi Eppel
3DV
81
1
0
14 Dec 2024
Explainable Search and Discovery of Visual Cultural Heritage Collections
  with Multimodal Large Language Models
Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models
T. Arnold
L. Tilton
32
0
0
07 Nov 2024
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution
  Learning and Bias Correcting
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
Xingyu Zhu
B. Zhu
Yi Tan
Shuo Wang
Y. Hao
Hanwang Zhang
VLM
VPVLM
26
1
0
25 Oct 2024
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon
M. Lee
Sangdon Park
Dongwoo Kim
31
1
0
08 Oct 2024
Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot
  Event-based Recognition
Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition
Zongyou Yu
Qiang Qu
Xiaoming Chen
Chen Wang
MLLM
29
1
0
15 Sep 2024
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
Dawei Yan
Pengcheng Li
Yang Li
Hao Chen
Qingguo Chen
Weihua Luo
Wei Dong
Qingsen Yan
Haokui Zhang
Chunhua Shen
3DV
VLM
31
4
0
15 Sep 2024
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
34
0
0
03 Sep 2024
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang
Peiwen Sun
Yuanchao Li
Honggang Zhang
Di Hu
32
5
0
15 Jul 2024
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation
  Models on Embodied Task Planning
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning
Min Zhang
Jianye Hao
Xian Fu
Peilong Han
Hao Zhang
Lei Shi
Hongyao Tang
Yan Zheng
28
1
0
06 Jul 2024
GPT-4V Explorations: Mining Autonomous Driving
GPT-4V Explorations: Mining Autonomous Driving
Zixuan Li
29
1
0
24 Jun 2024
Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge
  Transfer
Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer
Zengqun Zhao
Yu Cao
Shaogang Gong
Ioannis Patras
37
6
0
29 May 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang
Alyssa Unell
Xiaohan Wang
Dhruba Ghosh
Yuchang Su
Ludwig Schmidt
Serena Yeung-Levy
VLM
32
27
0
28 May 2024
Dense Connector for MLLMs
Dense Connector for MLLMs
Huanjin Yao
Wenhao Wu
Taojiannan Yang
Yuxin Song
Mengxi Zhang
Haocheng Feng
Yifan Sun
Zhiheng Li
Wanli Ouyang
Jingdong Wang
MLLM
VLM
32
16
0
22 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
29
19
0
13 May 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual
  Multilingual Ability of Vision Language Models
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
38
1
0
29 Mar 2024
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang
Di Huang
Jiajun Deng
Shixiang Tang
Wanli Ouyang
Tong He
Yanyong Zhang
VGen
30
13
0
18 Mar 2024
Bootstrapping Cognitive Agents with a Large Language Model
Bootstrapping Cognitive Agents with a Large Language Model
Feiyu Zhu
Reid Simmons
LLMAG
21
6
0
25 Feb 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion
  Recognition
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
Zheng Lian
Licai Sun
Haiyang Sun
Kang Chen
Zhuofan Wen
Hao Gu
Bin Liu
Jianhua Tao
23
27
0
07 Dec 2023
ChatGPT-Powered Hierarchical Comparisons for Image Classification
ChatGPT-Powered Hierarchical Comparisons for Image Classification
Zhiyuan Ren
Yiyang Su
Xiaoming Liu
VLM
40
21
0
01 Nov 2023
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu
Haipeng Luo
Bo Fang
Jingdong Wang
Wanli Ouyang
88
80
0
31 Dec 2022
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
94
47
0
31 Dec 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
PointCLIP: Point Cloud Understanding by CLIP
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
161
428
0
04 Dec 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
1