v1v2 (latest)

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

16 May 2024

Yihao Chen

Lei Zhang

ArXiv (abs)PDF HTML HuggingFace (31 upvotes)

Papers citing "Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection"

43 / 43 papers shown

SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding

Keita Otani

Tatsuya Harada

30 Nov 2025

Stable Offline Hand-Eye Calibration for any Robot with Just One Mark

180

21 Nov 2025

Multi-Step Reasoning for Embodied Question Answering via Tool Augmentation

193

23 Oct 2025

Chimera: Compositional Image Generation using Part-based Concepting

296

20 Oct 2025

Improved High-probability Convergence Guarantees of Decentralized SGD

Aleksandar Armacki

Ali H. Sayed

07 Oct 2025

On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond

192

07 Oct 2025

Inferring Dynamic Physical Properties from Video Foundation Models

159

02 Oct 2025

Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding

...

26 Sep 2025

See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model

126

19 Sep 2025

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

212

19 Sep 2025

Model-Agnostic Open-Set Air-to-Air Visual Object Detection for Reliable UAV Perception

Spyridon Loukovitis

Anastasios Arsenos

Vasileios Karampinis

Athanasios Voulodimos

11 Sep 2025

Object Detection with Multimodal Large Vision-Language Models: An In-depth ReviewInformation Fusion (Inf. Fusion), 2025

Ranjan Sapkota

Manoj Karkee

ObjD VLM

291

25 Aug 2025

RynnEC: Bringing MLLMs into Embodied World

208

19 Aug 2025

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

...

133

14 Aug 2025

Designing Object Detection Models for TinyML: Foundations, Comparative Analysis, Challenges, and Emerging SolutionsACM Computing Surveys (ACM Comput. Surv.), 2025

Christophe El Zeinaty

134

11 Aug 2025

ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow

156

05 Aug 2025

InspectVLM: Unified in Theory, Unreliable in Practice

112

03 Aug 2025

Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging

172

01 Aug 2025

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

168

01 Aug 2025

Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction

145

30 Jul 2025

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Jordan L. Boyd-Graber

KELM

243

30 Jul 2025

Spatio-Temporal LLM: Reasoning about Environments and Actions

204

07 Jul 2025

NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Max Gandyra

Alessandro Santonicola

Michael Beetz

261

02 Jul 2025

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

389

06 Jun 2025

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

501

29 May 2025

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

515

07 May 2025

OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection

338

06 May 2025

Aligning Anime Video Generation with Human Feedback

387

14 Apr 2025

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Sangwon Beak

Hyeonwoo Kim

Hanbyul Joo

305

25 Mar 2025

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

420

18 Mar 2025

AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations

318

17 Mar 2025

Embodied Crowd Counting

335

11 Mar 2025

Referring to Any Person

932

11 Mar 2025

FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction

318

10 Mar 2025

Consistent Image Layout Editing with Diffusion Models

291

09 Mar 2025

ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part SegmentationInternational Conference on 3D Vision (3DV), 2023

579

24 Feb 2025

DynamicEarth: How Far are We from Open-Vocabulary Change Detection?

322

22 Jan 2025

Instruction-Guided Scene Text RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

500

03 Jan 2025

HandOS: 3D Hand Reconstruction in One StageComputer Vision and Pattern Recognition (CVPR), 2024

500

02 Dec 2024

RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

...

582

29 Nov 2024

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

555

27 Nov 2024

RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven ExplorationIEEE Robotics and Automation Letters (RA-L), 2024

439

26 Sep 2024

OW-Rep: Open World Object Detection with Instance Representation Learning

1.2K

24 Sep 2024