ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,335 papers shown
Title
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion
  Models
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
Pengxiang Li
Kai Chen
Zhili Liu
Ruiyuan Gao
Lanqing Hong
Guo Zhou
Hua Yao
Dit-Yan Yeung
Huchuan Lu
Xu Jia
VGen
DiffM
20
0
0
01 Dec 2023
Segment Any 3D Gaussians
Segment Any 3D Gaussians
Jiazhong Cen
Jiemin Fang
Chen Yang
Lingxi Xie
Xiaopeng Zhang
Wei Shen
Qi Tian
3DGS
62
69
0
01 Dec 2023
AV-RIR: Audio-Visual Room Impulse Response Estimation
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah
Sreyan Ghosh
Sonal Kumar
Purva Chiniya
Dinesh Manocha
36
14
0
30 Nov 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language
  Understanding
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
21
17
0
30 Nov 2023
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion
  Models
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models
Zhonghao Wang
Wei Wei
Yang Zhao
Zhisheng Xiao
M. Hasegawa-Johnson
Humphrey Shi
Tingbo Hou
DiffM
20
11
0
30 Nov 2023
Language-conditioned Detection Transformer
Language-conditioned Detection Transformer
Jang Hyun Cho
Philipp Krahenbuhl
VLM
ObjD
42
1
0
29 Nov 2023
One-Shot Open Affordance Learning with Foundation Models
One-Shot Open Affordance Learning with Foundation Models
Gen Li
Deqing Sun
Laura Sevilla-Lara
Varun Jampani
VLM
61
21
0
29 Nov 2023
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via
  Lightweight Erasers
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang
Kai-Po Chang
Chung-Ting Tsai
Yung-Hsuan Lai
Fu-En Yang
Yu-Chiang Frank Wang
DiffM
11
46
0
29 Nov 2023
LanGWM: Language Grounded World Model
LanGWM: Language Grounded World Model
Rudra P. K. Poudel
Harit Pandya
Chao Zhang
Roberto Cipolla
17
5
0
29 Nov 2023
The devil is in the fine-grained details: Evaluating open-vocabulary
  object detectors for fine-grained understanding
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Lorenzo Bianchi
F. Carrara
Nicola Messina
Claudio Gennaro
Fabrizio Falchi
ObjD
22
13
0
29 Nov 2023
LLM-State: Open World State Representation for Long-horizon Task
  Planning with Large Language Model
LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model
Siwei Chen
Anxing Xiao
David Hsu
LM&Ro
16
5
0
29 Nov 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
21
43
0
28 Nov 2023
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD
  Programs
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan
Jing Xu
Hao Pan
Adrien Bousseau
Niloy J. Mitra
Changjian Li
15
8
0
28 Nov 2023
ROSO: Improving Robotic Policy Inference via Synthetic Observations
ROSO: Improving Robotic Policy Inference via Synthetic Observations
Yusuke Miyashita
Dimitris Gahtidis
Colin La
Jeremy Rabinowicz
Juxi Leitner
35
1
0
28 Nov 2023
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video
  Generation
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation
Sitong Su
Litao Guo
Lianli Gao
Hengtao Shen
Jingkuan Song
VGen
26
4
0
28 Nov 2023
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Jiemin Fang
Junjie Wang
Xiaopeng Zhang
Lingxi Xie
Qi Tian
3DGS
DiffM
20
107
0
27 Nov 2023
Having Second Thoughts? Let's hear it
Having Second Thoughts? Let's hear it
J. H. Lee
Sujith Vijayan
AAML
6
0
0
26 Nov 2023
Obj-NeRF: Extract Object NeRFs from Multi-view Images
Obj-NeRF: Extract Object NeRFs from Multi-view Images
Zhiyi Li
Lihe Ding
Tianfan Xue
19
1
0
26 Nov 2023
Leveraging Diffusion Perturbations for Measuring Fairness in Computer
  Vision
Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision
Nicholas Lui
Bryan Chia
William Berrios
Candace Ross
Douwe Kiela
19
2
0
25 Nov 2023
Benchmarking Robustness of Text-Image Composed Retrieval
Benchmarking Robustness of Text-Image Composed Retrieval
Shitong Sun
Jindong Gu
Shaogang Gong
CoGe
31
1
0
24 Nov 2023
SEGIC: Unleashing the Emergent Correspondence for In-Context
  Segmentation
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng
Shiyi Lan
Hengduo Li
Jose M. Alvarez
Zuxuan Wu
Yu-Gang Jiang
VLM
ISeg
MLLM
28
6
0
24 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large
  Language Models
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
50
14
0
24 Nov 2023
Visual In-Context Prompting
Visual In-Context Prompting
Feng Li
Qing Jiang
Hao Zhang
Tianhe Ren
Shilong Liu
...
Hongyang Li
Chun-yue Li
Jianwei Yang
Lei Zhang
Jianfeng Gao
VLM
LRM
MLLM
27
30
0
22 Nov 2023
T-Rex: Counting by Visual Prompting
T-Rex: Counting by Visual Prompting
Qing Jiang
Feng Li
Tianhe Ren
Shilong Liu
Zhaoyang Zeng
Kent Yu
Lei Zhang
16
10
0
22 Nov 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
17
34
0
22 Nov 2023
SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic
  Segmentation in Intelligent Vehicles
SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent Vehicles
Weihao Yan
Yeqiang Qian
Xingyuan Chen
Hanyang Zhuang
Chunxiang Wang
Ming Yang
20
5
0
22 Nov 2023
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction
  Data
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu
Juncheng Li
Longhui Wei
Liang Pang
Wentao Ye
Bosheng Qin
Siliang Tang
Qi Tian
Yueting Zhuang
MLLM
VLM
25
67
0
22 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLM
VLM
18
579
0
21 Nov 2023
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with
  Spatial Relation Matching
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu
Zhedong Zheng
Wei Ji
Tingyu Wang
Tat-Seng Chua
21
9
0
21 Nov 2023
DAS: A Deformable Attention to Capture Salient Information in CNNs
DAS: A Deformable Attention to Capture Salient Information in CNNs
Farzad Salajegheh
Nader Asadi
Soroush Saryazdi
Sudhir Mudur
9
3
0
20 Nov 2023
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated
  Student-Teacher Learning
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Dunyun He
Jiaqi Zhou
Wenxian Yu
ObjD
VLM
25
7
0
20 Nov 2023
AutoStory: Generating Diverse Storytelling Images with Minimal Human
  Effort
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
Wen Wang
Canyu Zhao
Hao Chen
Zhekai Chen
Kecheng Zheng
Chunhua Shen
DiffM
16
21
0
19 Nov 2023
Behavior Optimized Image Generation
Behavior Optimized Image Generation
Varun Khurana
Yaman Kumar Singla
J. Subramanian
R. Shah
Changyou Chen
Zhiqiang Xu
Balaji Krishnamurthy
EGVM
8
4
0
18 Nov 2023
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph
  Generation via Visual-Concept Alignment and Retention
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen
Jinlin Wu
Zhen Lei
Zhaoxiang Zhang
Changwen Chen
23
11
0
18 Nov 2023
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin
Adam Polyak
Uriel Singer
Yuval Kirstain
Amit Zohar
Oron Ashual
Devi Parikh
Yaniv Taigman
19
129
0
16 Nov 2023
Incremental Object-Based Novelty Detection with Feedback Loop
Incremental Object-Based Novelty Detection with Feedback Loop
Simone Caldarella
Elisa Ricci
Rahaf Aljundi
26
0
0
15 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal
  Grounding
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
19
4
0
15 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
33
208
0
13 Nov 2023
Volcano: Mitigating Multimodal Hallucination through Self-Feedback
  Guided Revision
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
Seongyun Lee
Sue Hyun Park
Yongrae Jo
Minjoon Seo
28
50
0
13 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
18
30
0
11 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
31
142
0
10 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
52
103
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
40
51
0
08 Nov 2023
Enhancing Multimodal Compositional Reasoning of Visual Language Models
  with Generative Negative Mining
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining
U. Sahin
Hang Li
Qadeer Ahmad Khan
Daniel Cremers
Volker Tresp
VLM
CoGe
23
12
0
07 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
36
199
0
06 Nov 2023
CoVLM: Composing Visual Entities and Relationships in Large Language
  Models Via Communicative Decoding
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
13
14
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
17
445
0
06 Nov 2023
Masking Hyperspectral Imaging Data with Pretrained Models
Masking Hyperspectral Imaging Data with Pretrained Models
Elias Arbash
Andréa de Lima Ribeiro
Sam Thiele
Nina Gnann
Behnood Rasti
Margret Fuchs
Pedram Ghamisi
R. Gloaguen
14
4
0
06 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D
  Data
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
16
54
0
06 Nov 2023
Previous
123...222324252627
Next