ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.03857
  4. Cited By
Grounded Language-Image Pre-training

Grounded Language-Image Pre-training

7 December 2021
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
Yiwu Zhong
Lijuan Wang
Lu Yuan
Lei Zhang
Jenq-Neng Hwang
Kai-Wei Chang
Jianfeng Gao
    ObjD
    VLM
ArXivPDFHTML

Papers citing "Grounded Language-Image Pre-training"

50 / 147 papers shown
Title
Visually Interpretable Subtask Reasoning for Visual Question Answering
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
24
0
0
12 May 2025
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
58
0
0
08 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
43
0
0
06 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
22
0
0
06 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
34
0
0
04 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
45
0
0
03 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Z. Wang
Tao Jin
DiffM
97
2
0
30 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
49
0
0
22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
Bingyang Wang
Kaer Huang
Bin Li
Yiqiang Yan
L. Zhang
Huchuan Lu
You He
VLM
26
0
0
07 Apr 2025
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li
En Yu
Sijia Chen
Wenbing Tao
52
1
0
13 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
56
0
0
13 Mar 2025
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin
Xiuwei Xu
Lingqing Zhao
Z. Wang
Jie Zhou
Jiwen Lu
64
2
0
13 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
80
0
0
11 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Z. Liu
Qingjie Liu
Y. Wang
ObjD
81
0
0
08 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei K. Zhang
Bo Yang
Hua Chen
59
1
0
05 Mar 2025
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation
Dujun Nie
Xianda Guo
Yiqun Duan
Ruijun Zhang
Long Chen
LM&Ro
113
2
0
04 Mar 2025
A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models
Seyed Mohamad Ali Tousi
Ramy M. A. Farag
Jacket Demby's
Gbenga Omotara
John A. Lory
Guilherme N. DeSouza
68
0
0
03 Mar 2025
QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries
QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries
N. H. Chapman
Feras Dayoub
Will N. Browne
Christopher F. Lehnert
VLM
64
0
0
26 Feb 2025
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
Mingkun Zhang
Keping Bi
Wei Chen
J. Guo
Xueqi Cheng
BDL
VLM
50
1
0
25 Feb 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Haoyuan Li
Yanpeng Zhou
Tao Tang
Jifei Song
Yihan Zeng
Michael C. Kampffmeyer
Hang Xu
Xiaodan Liang
3DGS
57
1
0
25 Feb 2025
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue
Nenglun Chen
Jun Liu
Wenyun Sun
3DPC
55
7
0
24 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
104
9
0
18 Feb 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers
Bill Chunyuan Zheng
Anca Dragan
Kuan Fang
Sergey Levine
60
0
0
08 Feb 2025
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Xiaowen Qiu
Jincheng Yang
Yian Wang
Zhehuan Chen
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
87
4
0
04 Feb 2025
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Mai A. Shaaban
Adnan Khan
Mohammad Yaqub
LM&MA
78
2
0
28 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
95
17
0
17 Jan 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
Enhancing Novel Object Detection via Cooperative Foundational Models
Rohit K Bharadwaj
Muzammal Naseer
Salman Khan
F. Khan
ObjD
VLM
100
1
0
17 Jan 2025
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
Divyansh Srivastava
Beatriz Cabrero-Daniel
Christian Berger
VLM
57
8
0
17 Jan 2025
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
Ahmad Süleyman
Göksel Biricik
41
2
0
15 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
68
3
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
Can video generation replace cinematographers? Research on the cinematic language of generated video
Can video generation replace cinematographers? Research on the cinematic language of generated video
X. Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGen
DiffM
90
1
0
16 Dec 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
105
6
0
27 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
91
1
0
27 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
82
4
0
25 Nov 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
85
3
0
25 Nov 2024
Find Any Part in 3D
Find Any Part in 3D
Ziqi Ma
Yisong Yue
Georgia Gkioxari
3DPC
113
3
0
20 Nov 2024
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot
  Nuclei Detection via Visual-Language Pre-trained Models
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu
Yang Zhou
Jiya Saiyin
Bingzheng Wei
M. Lai
Jianzhong Shou
Yan Xu
VLM
MedIm
25
0
0
22 Oct 2024
Open World Object Detection: A Survey
Open World Object Detection: A Survey
Yiming Li
Yi Wang
Wenqian Wang
Dan Lin
Bingbing Li
Kim-Hui Yap
ObjD
30
0
0
15 Oct 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Lianyu Hu
Tongkai Shi
Wei Feng
Fanhua Shang
Liang Wan
VLM
29
0
0
09 Oct 2024
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal
Max van Spengler
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Fabio Galasso
Pascal Mettes
CoGe
40
5
0
09 Oct 2024
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Ayca Takmaz
Alexandros Delitzas
R. Sumner
Francis Engelmann
Johanna Wald
Federico Tombari
52
11
0
27 Sep 2024
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
Xi Wang
Tianxing Chen
Qiaojun Yu
Tianling Xu
Zanxin Chen
Yiting Fu
Cewu Lu
Cewu Lu
Ping Luo
Ping Luo
39
4
0
24 Sep 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
65
0
0
24 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
QueryCAD: Grounded Question Answering for CAD Models
QueryCAD: Grounded Question Answering for CAD Models
Claudius Kienle
Benjamin Alt
Darko Katic
Rainer Jäkel
Jan Peters
16
2
0
13 Sep 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
58
1
0
23 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
32
5
0
18 Jul 2024
123
Next