ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12763
  4. Cited By
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
    ObjD
    VLM
ArXivPDFHTML

Papers citing "MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"

50 / 607 papers shown
Title
Zero-shot Object Navigation with Vision-Language Models Reasoning
Zero-shot Object Navigation with Vision-Language Models Reasoning
Congcong Wen
Yisiyuan Huang
Hao Huang
Yanjia Huang
Shuaihang Yuan
Yu Hao
Hui Lin
Yu-Shen Liu
Yi Fang
LM&Ro
40
7
0
24 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large
  Multimodal Models
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
43
1
0
21 Oct 2024
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object
  Detection Considering Text Describability
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
Yusuke Hosoya
Masanori Suganuma
Takayuki Okatani
ObjD
16
0
0
20 Oct 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
23
0
0
17 Oct 2024
Context-Infused Visual Grounding for Art
Context-Infused Visual Grounding for Art
Selina Khan
Nanne van Noord
ObjD
27
1
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic
  Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
6
0
14 Oct 2024
DINTR: Tracking via Diffusion-based Interpolation
DINTR: Tracking via Diffusion-based Interpolation
Pha Nguyen
Ngan Le
J. Cothren
Alper Yilmaz
Khoa Luu
DiffM
38
0
0
14 Oct 2024
DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios
DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios
Siyi Jiao
Wenzheng Zeng
Changxin Gao
Nong Sang
28
1
0
13 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
G$^{2}$TR: Generalized Grounded Temporal Reasoning for Robot Instruction
  Following by Combining Large Pre-trained Models
G2^{2}2TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models
Riya Arora
N. N.
Aman Tambi
Sandeep S. Zachariah
Souvik Chakraborty
Rohan Paul
LM&Ro
28
0
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
30
2
0
09 Oct 2024
Grounding Partially-Defined Events in Multimodal Data
Grounding Partially-Defined Events in Multimodal Data
Kate Sanders
Reno Kriz
David Etter
Hannah Recknor
Alexander Martin
Cameron Carpenter
Jingyang Lin
Benjamin Van Durme
22
2
0
07 Oct 2024
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large
  Language Models
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
Mengxue Qu
Xiaodong Chen
Wu Liu
Alicia Li
Yao Zhao
37
13
0
01 Oct 2024
You Only Speak Once to See
You Only Speak Once to See
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
23
1
0
27 Sep 2024
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning
  for Multimodal Classification
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar
Raghav Singhal
Pranamya Kulkarni
Deval Mehta
Kshitij Jadhav
13
0
0
26 Sep 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled
  Multi-modal Fusion
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
27
9
0
26 Sep 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
44
3
0
23 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
27
0
0
23 Sep 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
23
0
0
18 Sep 2024
Robot Manipulation in Salient Vision through Referring Image
  Segmentation and Geometric Constraints
Robot Manipulation in Salient Vision through Referring Image Segmentation and Geometric Constraints
Chen Jiang
Allie Luo
Martin Jägersand
15
0
0
17 Sep 2024
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary
  Detection
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Haoxuan Wang
Q. He
Jinlong Peng
Hao Yang
Mingmin Chi
Yabiao Wang
Mamba
34
1
0
13 Sep 2024
VLTP: Vision-Language Guided Token Pruning for Task-Oriented
  Segmentation
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Yezi Liu
SungHeon Jeong
Fei Wen
Nathaniel Bastian
Hugo Latapie
Mohsen Imani
VLM
32
4
0
13 Sep 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open
  Detection
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
20
0
0
10 Sep 2024
Context is the Key: Backdoor Attacks for In-Context Learning with Vision
  Transformers
Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers
Gorka Abad
S. Picek
Lorenzo Cavallaro
A. Urbieta
SILM
37
0
0
06 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through
  Expression-guided Dynamic Gating and Regression
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
31
0
0
05 Sep 2024
More Pictures Say More: Visual Intersection Network for Open Set Object
  Detection
More Pictures Say More: Visual Intersection Network for Open Set Object Detection
Bingcheng Dong
Yuning Ding
Jinrong Zhang
Sifan Zhang
Shenglan Liu
ObjD
33
0
0
26 Aug 2024
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages
  in Multimodal Image Retrieval Task
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task
Ali Asgarov
Samir Rustamov
VLM
14
1
0
25 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
R2G: Reasoning to Ground in 3D Scenes
Yixuan Li
Zan Wang
Wei Liang
41
2
0
24 Aug 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal
  models
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
29
1
0
21 Aug 2024
On the Potential of Open-Vocabulary Models for Object Detection in
  Unusual Street Scenes
On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes
Sadia Ilyas
Ido Freeman
Matthias Rottmann
ObjD
43
3
0
20 Aug 2024
Towards Flexible Visual Relationship Segmentation
Towards Flexible Visual Relationship Segmentation
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
29
1
0
15 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for
  Multi-Task Visual Grounding
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen
Mahdieh Hatamian
Yu Wu
37
3
0
02 Aug 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
34
1
0
28 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
31
1
0
23 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
25
0
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
34
1
0
22 Jul 2024
Weak-to-Strong Compositional Learning from Generative Models for
  Language-based Object Detection
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
Kwanyong Park
Kuniaki Saito
Donghyun Kim
VLM
CoGe
37
0
0
21 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
32
5
0
18 Jul 2024
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language
  Pre-trained Models
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
Yang Zhou
Yongjian Wu
Jiya Saiyin
Bingzheng Wei
Maode Lai
Eric Chang
Yan Xu
VLM
30
0
0
16 Jul 2024
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Yu Wang
Xiangbo Su
Qiang Chen
Xinyu Zhang
Teng Xi
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ObjD
VLM
36
0
0
15 Jul 2024
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan
Yantao Lai
Mengyu Qiu
Dong Liang
ViT
22
0
0
15 Jul 2024
Plain-Det: A Plain Multi-Dataset Object Detector
Plain-Det: A Plain Multi-Dataset Object Detector
Cheng Shi
Yuchen Zhu
Sibei Yang
ObjD
VLM
24
0
0
14 Jul 2024
Layer-Wise Relevance Propagation with Conservation Property for ResNet
Layer-Wise Relevance Propagation with Conservation Property for ResNet
Seitaro Otsuki
T. Iida
Félix Doublet
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
FAtt
38
4
0
12 Jul 2024
Textual Query-Driven Mask Transformer for Domain Generalized
  Segmentation
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak
Byeongju Woo
Sunghwan Kim
Dae-Hwan Kim
Hoseong Kim
37
3
0
12 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for
  Resource-Limited Transfer Learning
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
37
4
0
10 Jul 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
31
2
0
10 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on
  Embodied AI
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Y. Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
48
47
0
09 Jul 2024
Multi-Object Hallucination in Vision-Language Models
Multi-Object Hallucination in Vision-Language Models
Xuweiyi Chen
Ziqiao Ma
Xuejun Zhang
Sihan Xu
Shengyi Qian
Jianing Yang
David Fouhey
Joyce Chai
47
15
0
08 Jul 2024
Described Spatial-Temporal Video Detection
Described Spatial-Temporal Video Detection
Wei Ji
Xiangyan Liu
Yingfei Sun
Jiajun Deng
You Qin
Ammar Nuwanna
Mengyao Qiu
Lina Wei
Roger Zimmermann
24
2
0
08 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot
  Performance
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
29
2
0
08 Jul 2024
Previous
12345...111213
Next