Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.03857
Cited By
Grounded Language-Image Pre-training
7 December 2021
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
Yiwu Zhong
Lijuan Wang
Lu Yuan
Lei Zhang
Jenq-Neng Hwang
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounded Language-Image Pre-training"
50 / 147 papers shown
Title
PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
Xiao-Li Li
Yining Liu
Na Dong
Sitian Qin
Xiaolin Hu
34
3
0
15 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
70
1
0
21 Jun 2024
Proxy Denoising for Source-Free Domain Adaptation
Song Tang
Wenxin Su
Mao Ye
Jianwei Zhang
Xiatian Zhu
Xiatian Zhu
59
1
0
03 Jun 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Y. Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
31
36
0
30 May 2024
Disease-informed Adaptation of Vision-Language Models
Jiajin Zhang
Ge Wang
M. Kalra
P. Yan
VLM
34
2
0
24 May 2024
Wild Berry image dataset collected in Finnish forests and peatlands using drones
Luigi Riz
Sergio Povoli
Andrea Caraffa
Davide Boscaini
M. L. Mekhalfi
...
Elisa Castelli
Giacomo Piccinini
L. Marchesotti
Micael S. Couceiro
Fabio Poiesi
26
1
0
13 May 2024
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation
Shengyuan Liu
Bo Wang
Ye Ma
Te Yang
Xipeng Cao
Quan Chen
Han Li
Di Dong
Peng Jiang
EGVM
28
2
0
11 May 2024
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella
Ismail Ben Ayed
VLM
MLLM
35
22
0
03 May 2024
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen
Iro Laina
Andrea Vedaldi
3DGS
40
23
0
29 Apr 2024
Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Konstantinos Vilouras
Pedro Sanchez
Alison Q. OÑeil
Sotirios A. Tsaftaris
MedIm
37
2
0
19 Apr 2024
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
29
4
0
15 Apr 2024
Improving Continuous Sign Language Recognition with Adapted Image Models
Lianyu Hu
Tongkai Shi
Liqing Gao
Zekang Liu
Wei Feng
VLM
18
5
0
12 Apr 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
62
12
0
05 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
61
1
0
01 Mar 2024
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Ming-hui Li
Shuai Li
Xindong Zhang
Lei Zhang
VOS
33
16
0
28 Feb 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen
Yao Mu
Qiaojun Yu
Tianming Wei
Silang Wu
...
Wenqi Shao
Yu Qiao
Huazhe Xu
Mingyu Ding
Ping Luo
LM&Ro
25
11
0
22 Feb 2024
GOOD: Towards Domain Generalized Orientated Object Detection
Qi Bi
Beichen Zhou
Jingjun Yi
Wei Ji
Haolan Zhan
Gui-Song Xia
ObjD
OOD
74
2
0
20 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
12
0
15 Feb 2024
Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance
Raza Imam
Muhammad Huzaifa
Nabil Mansour
Shaher Bano Mirza
Fouad Lamghari
20
0
0
10 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
18
0
0
19 Jan 2024
PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
Hyunjin Kim
Minhyuk Sung
38
8
0
11 Jan 2024
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLM
EGVM
27
0
0
23 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
19
15
0
19 Dec 2023
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning
Yi Xin
Junlong Du
Qiang Wang
Ke Yan
Shouhong Ding
VLM
29
45
0
14 Dec 2023
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Kaipeng Zheng
Weiran Huang
Lichao Sun
LM&MA
MedIm
VLM
19
0
0
12 Dec 2023
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCV
AI4TS
AI4CE
27
4
0
11 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
25
81
0
06 Dec 2023
DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
Zhiqin Chen
Qimin Chen
Hang Zhou
Hao Zhang
3DPC
3DV
14
2
0
22 Nov 2023
Advances in Embodied Navigation Using Large Language Models: A Survey
Jinzhou Lin
Han Gao
Xuxiang Feng
Rongtao Xu
Changwei Wang
Man Zhang
Li Guo
Shibiao Xu
LM&Ro
LLMAG
56
9
0
01 Nov 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
17
0
0
20 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
17
32
0
20 Oct 2023
Semantic Scene Difference Detection in Daily Life Patroling by Mobile Robots using Pre-Trained Large-Scale Vision-Language Model
Yoshiki Obinata
Kento Kawaharazuka
Naoaki Kanazawa
N. Yamaguchi
Naoto Tsukamoto
Iori Yanokura
Shingo Kitagawa
Koki Shinjo
K. Okada
Masayuki Inaba
LM&Ro
15
5
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
61
222
0
26 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
21
2
0
06 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
39
46
0
01 Sep 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
23
2
0
30 Aug 2023
Link-Context Learning for Multimodal LLMs
Yan Tai
Weichen Fan
Zhao Zhang
Feng Zhu
Rui Zhao
Ziwei Liu
ReLM
LRM
21
17
0
15 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
28
17
0
15 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
29
76
0
12 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
116
0
25 Jul 2023
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
Xiaofeng Mao
YueFeng Chen
Yao Zhu
Da Chen
Hang Su
Rong Zhang
H. Xue
ObjD
OOD
21
18
0
24 Jul 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Wenhan Luo
Xiaoying Tang
MedIm
8
43
0
24 Jul 2023
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
22
3
0
19 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
80
223
0
07 Jul 2023
Multi-Similarity Contrastive Learning
Emily Mu
John Guttag
Maggie Makar
SSL
23
2
0
06 Jul 2023
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang
Shufang Li
Konstantinos Kallidromitis
Yu Kato
Kazuki Kozuka
Trevor Darrell
VLM
OCL
30
36
0
03 Jul 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
29
6
0
14 Jun 2023
Previous
1
2
3
Next