ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.06674
  4. Cited By
Universal Instance Perception as Object Discovery and Retrieval

Universal Instance Perception as Object Discovery and Retrieval

12 March 2023
B. Yan
Yi-Xin Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
    VOS
    VLM
    LRM
ArXivPDFHTML

Papers citing "Universal Instance Perception as Object Discovery and Retrieval"

34 / 34 papers shown
Title
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
60
0
0
15 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
66
6
1
14 Apr 2025
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
Junjie Jiang
Zelin Wang
Manqi Zhao
Yin Li
Dongsheng Jiang
34
0
0
06 Apr 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
62
3
0
10 Mar 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
X. Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
89
11
0
07 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
39
3
0
23 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
42
3
0
16 Sep 2024
Object Segmentation from Open-Vocabulary Manipulation Instructions Based
  on Optimal Transport Polygon Matching with Multimodal Foundation Models
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura
Katsuyuki Kuyo
Motonari Kambara
Komei Sugiura
DiffM
22
0
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
51
23
0
28 Jun 2024
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion
  Expression guided Video Segmentation
1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Mingqi Gao
Jingnan Luo
Jinyu Yang
Jungong Han
Feng Zheng
24
2
0
11 Jun 2024
Matching Anything by Segmenting Anything
Matching Anything by Segmenting Anything
Siyuan Li
Lei Ke
Martin Danelljan
Luigi Piccinelli
Mattia Segu
Luc Van Gool
Fisher Yu
VOS
29
22
0
06 Jun 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
45
0
0
22 May 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous
  Driving
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
48
6
0
28 Feb 2024
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Ming-hui Li
Shuai Li
Xindong Zhang
Lei Zhang
VOS
33
16
0
28 Feb 2024
1st Place Solution for 5th LSVOS Challenge: Referring Video Object
  Segmentation
1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation
Zhuoyan Luo
Yicheng Xiao
Yong Liu
Yitong Wang
Yansong Tang
Xiu Li
Yujiu Yang
VOS
19
2
0
01 Jan 2024
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
31
140
0
10 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
25
121
0
07 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
13
23
0
31 Aug 2023
Hierarchical Open-vocabulary Universal Image Segmentation
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang
Shufang Li
Konstantinos Kallidromitis
Yu Kato
Kazuki Kozuka
Trevor Darrell
VLM
OCL
30
36
0
03 Jul 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
11
29
0
25 May 2023
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual
  Object Tracking
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
Xin Chen
Houwen Peng
Jiawen Zhu
Dong Wang
Han Hu
Huchuan Lu
61
22
0
27 Apr 2023
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
133
308
0
04 Dec 2021
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang
Pei Sun
Yi-Xin Jiang
Dongdong Yu
Fucheng Weng
Zehuan Yuan
Ping Luo
Wenyu Liu
Xinggang Wang
VOT
96
1,289
0
13 Oct 2021
TrackFormer: Multi-Object Tracking with Transformers
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
208
732
0
07 Jan 2021
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
159
282
0
19 Mar 2020
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
A. Athar
Sabarinath Mahadevan
Aljosa Osep
Laura Leal-Taixé
Bastian Leibe
VOS
70
169
0
18 Mar 2020
Conditional Convolutions for Instance Segmentation
Conditional Convolutions for Instance Segmentation
Zhi Tian
Chunhua Shen
Hao Chen
ISeg
167
596
0
12 Mar 2020
Learning Fast and Robust Target Models for Video Object Segmentation
Learning Fast and Robust Target Models for Video Object Segmentation
Andreas Robinson
Felix Järemo Lawin
Martin Danelljan
F. Khan
M. Felsberg
VOS
47
137
0
27 Feb 2020
Towards Real-Time Multi-Object Tracking
Towards Real-Time Multi-Object Tracking
Zhongdao Wang
Liang Zheng
Yixuan Liu
Yali Li
Shengjin Wang
VOT
235
844
0
27 Sep 2019
A Real-Time Cross-modality Correlation Filtering Method for Referring
  Expression Comprehension
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
Yue Liao
Si Liu
Guanbin Li
Fei-Yue Wang
Yanjie Chen
Chao Qian
Bo-wen Li
ObjD
62
174
0
16 Sep 2019
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in
  the Wild
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
Matthias Muller
Adel Bibi
Silvio Giancola
Salman Al-Subaihi
Bernard Ghanem
203
785
0
28 Mar 2018
Simple Online and Realtime Tracking with a Deep Association Metric
Simple Online and Realtime Tracking with a Deep Association Metric
N. Wojke
Alex Bewley
Dietrich Paulus
VOT
217
3,407
0
21 Mar 2017
1