ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12763
  4. Cited By
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
v1v2 (latest)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
    ObjDVLM
ArXiv (abs)PDFHTMLGithub (1008★)

Papers citing "MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"

50 / 678 papers shown
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Lingdong Kong
Dongyue Lu
Ao Liang
Rong Li
Yuhao Dong
Tianshuai Hu
Lai Xing Ng
Wei Tsang Ooi
Benoit R. Cottereau
VGen
316
4
0
23 Jul 2025
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Yizhi Hu
Zezhao Tian
Xingqun Qi
Chen Su
Bingkun Yang
Junhui Yin
Muyi Sun
Man Zhang
Zhenan Sun
ObjD
146
0
0
22 Jul 2025
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang
Haoxian Tan
Cong Wei
Yujie Zhong
Dengjie Li
Yujiu Yang
VLM
223
2
0
22 Jul 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
238
0
0
01 Jul 2025
MDC-R: The Minecraft Dialogue Corpus with Reference
MDC-R: The Minecraft Dialogue Corpus with Reference
Chris Madge
Maris Camilleri
Paloma Carretero García
Vanja Karan
Juexi Shao
Prashant Jayannavar
Julian Hough
Benjamin Roth
Massimo Poesio
129
2
0
27 Jun 2025
Referring Expression Instance Retrieval and A Strong End-to-End Baseline
Referring Expression Instance Retrieval and A Strong End-to-End Baseline
Xiangzhao Hao
Kuan Zhu
Hongyu Guo
Haiyun Guo
Ning Jiang
Quan Lu
Ming Tang
Jinqiao Wang
303
1
0
23 Jun 2025
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
Trishna Chakraborty
Udita Ghosh
Xiaopan Zhang
Fahim Faisal Niloy
Yue Dong
Jiachen Li
Amit K. Roy-Chowdhury
Chengyu Song
LLMAGHILMLRM
251
3
0
18 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
308
0
0
13 Jun 2025
Auto-Labeling Data for Object Detection
Auto-Labeling Data for Object Detection
Brent A. Griffin
Manushree Gangwar
Jacob Sela
Jason J. Corso
ObjDVLM
260
0
0
03 Jun 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLMLRM
335
19
0
29 May 2025
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Shurong Zheng
Fan Yang
Ming Tang
Jinqiao Wang
VLMLRM
271
1
0
27 May 2025
Open-Det: An Efficient Learning Framework for Open-Ended Detection
Open-Det: An Efficient Learning Framework for Open-Ended Detection
Guiping Cao
Tao Wang
Wenjian Huang
X. Lan
Jianguo Zhang
Shihong Deng
ObjDVLM
202
1
0
27 May 2025
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
Alaa Dalaq
Muzammil Behzad
VLM
412
0
0
25 May 2025
VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection
VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection
Aditya Taparia
Noel Ngu
Mario Leiva
Joshua Shay Kricheli
John Corcoran
Nathaniel D. Bastian
Gerardo Simari
Paulo Shakarian
Ransalu Senanayake
ObjD
266
0
0
19 May 2025
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLMLRM
476
5
0
17 May 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Shun Inadumi
Nobuhiro Ueda
Koichiro Yoshino
ObjD
349
0
0
16 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
500
3
0
22 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
280
15
0
19 Apr 2025
Visual Intention Grounding for Egocentric Assistants
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun
Junbin Xiao
Tze Ho Elden Tse
Yicong Li
Arjun Akula
Angela Yao
EgoV
279
1
0
18 Apr 2025
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions
Yifei Dong
Fengyi Wu
Sanjian Zhang
Guangyu Chen
Yuzhi Hu
...
Yuxuan Zhou
Siyu Huang
Feng Liu
Jingdong Sun
Zhi-Qi Cheng
456
8
0
16 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLMLRM
263
0
0
15 Apr 2025
NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results
NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results
Yuqian Fu
Xingyu Qiu
Bin Ren
Yanwei Fu
Radu Timofte
...
Dianmo Sheng
Xuanpu Zhao
Zhiyu Li
X. Ding
Wenqian Li
253
31
0
14 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLMOffRLLRM
318
58
0
10 Apr 2025
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Few-Shot Adaptation of Grounding DINO for Agricultural Domain
Rajhans Singh
Rafael Bidese Puhl
Kshitiz Dhakal
Sudhir Sornapudi
309
3
0
09 Apr 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Jiuxiang Gu
Franck Dernoncourt
Wanrong Zhu
Wanrong Zhu
Tianyi Zhou
Tong Sun
435
12
0
07 Apr 2025
Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding
Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding
Zahir Alsulaimawi
140
1
0
07 Apr 2025
Multimodal Reference Visual Grounding
Multimodal Reference Visual Grounding
Yangxiao Lu
Ruosen Li
Liqiang Jing
Jikai Wang
Xinya Du
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
ObjD
329
1
0
02 Apr 2025
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
BOOTPLACE: Bootstrapped Object Placement with Detection TransformersComputer Vision and Pattern Recognition (CVPR), 2025
Hang Zhou
Wei Ji
Rui Ma
Li Cheng
ViT
278
0
0
27 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CTRL-O: Language-Controllable Object-Centric Visual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2025
Aniket Didolkar
Antonios Tragoudaras
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCLVLM
427
6
0
27 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
227
0
0
25 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
534
7
0
19 Mar 2025
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao
Xinran Deng
Xin Gu
Mengrui Dai
Bing Fan
Zhipeng Zhang
Yan Huang
Heng Fan
L. Zhang
420
4
0
13 Mar 2025
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Chiara Cappellino
Gianluca Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
ObjDVLM
561
1
0
12 Mar 2025
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou
Gim Hee Lee
255
2
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLMObjD
212
0
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
VLMObjD
544
34
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
300
31
0
08 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei Zhang
Bo Yang
Hua Chen
665
14
0
05 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Nianzu Yang
Yun Zheng
Liwei Wang
ObjDVLM
454
13
0
03 Mar 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
X. J. Yang
Jing Liu
Peng Wang
Guoqing Wang
Yue Yang
Mengqi Li
ObjD
491
5
0
27 Feb 2025
From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs
From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs
Ang Cao
Sergio Arnaud
Oleksandr Maksymets
Jianing Yang
Ayush Jain
...
Aravind Rajeswaran
Franziska Meier
Justin Johnson
Jeong Joon Park
Alexander Sax
343
0
0
27 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
395
4
0
24 Feb 2025
Anatomical grounding pre-training for medical phrase grounding
Anatomical grounding pre-training for medical phrase groundingIEEE International Symposium on Biomedical Imaging (ISBI), 2025
Wenjun Zhang
Shakes Chandra
Aaron Nicolson
MedIm
191
3
0
23 Feb 2025
Predicate Hierarchies Improve Few-Shot State Classification
Predicate Hierarchies Improve Few-Shot State ClassificationInternational Conference on Learning Representations (ICLR), 2025
Emily Jin
Joy Hsu
Jiajun Wu
OffRL
437
1
0
18 Feb 2025
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingInternational Conference on Learning Representations (ICLR), 2025
Xin Gu
Yaojie Shen
Chenxi Luo
Tiejian Luo
Yan Huang
Lu Ma
Heng Fan
L. Zhang
283
7
0
16 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chunbai Zhang
Yang Zhou
Yang Zhou
Yan Peng
LRMReLM
416
1
0
02 Feb 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Shenghao Fu
Q. Yang
Qijie Mo
Junkai Yan
Xihan Wei
Jingke Meng
Xiaohua Xie
Wei-Shi Zheng
MLLMObjDVLM
453
33
0
31 Jan 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsAAAI Conference on Artificial Intelligence (AAAI), 2025
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
370
13
0
12 Jan 2025
BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs
BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Riccardo Andrea Izzo
Gianluca Bardaro
Matteo Matteucci
LM&Ro
291
18
0
08 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
844
119
0
03 Jan 2025
Previous
12345...121314
Next