ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.09158
  4. Cited By
General Object Foundation Model for Images and Videos at Scale

General Object Foundation Model for Images and Videos at Scale

14 December 2023
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
    VOS
    VLM
ArXivPDFHTML

Papers citing "General Object Foundation Model for Images and Videos at Scale"

20 / 20 papers shown
Title
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
75
0
0
30 Apr 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
88
45
0
03 Jan 2025
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
107
0
0
12 Dec 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
79
3
0
25 Nov 2024
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
J. Vice
Naveed Akhtar
Richard I. Hartley
Ajmal Saeed Mian
Ajmal Mian
DiffM
82
0
0
21 Nov 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator
  through Scene Imagination
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
32
2
0
13 Oct 2024
CloudTrack: Scalable UAV Tracking with Cloud Semantics
CloudTrack: Scalable UAV Tracking with Cloud Semantics
Yannik Blei
Michael Krawez
Nisarga Nilavadi
Tanja Katharina Kaiser
Wolfram Burgard
37
1
0
24 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
198
318
0
08 Mar 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
  Open-world Detection
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
98
0
20 Sep 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang
Pei Sun
Yi-Xin Jiang
Dongdong Yu
Fucheng Weng
Zehuan Yuan
Ping Luo
Wenyu Liu
Xinggang Wang
VOT
96
1,289
0
13 Oct 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
220
698
0
28 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
TrackFormer: Multi-Object Tracking with Transformers
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
208
732
0
07 Jan 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance
  Segmentation
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Yin Cui
A. Srinivas
Rui Qian
Tsung-Yi Lin
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
223
835
0
13 Dec 2020
Learning Fast and Robust Target Models for Video Object Segmentation
Learning Fast and Robust Target Models for Video Object Segmentation
Andreas Robinson
Felix Järemo Lawin
Martin Danelljan
F. Khan
M. Felsberg
VOS
47
137
0
27 Feb 2020
Simple Online and Realtime Tracking with a Deep Association Metric
Simple Online and Realtime Tracking with a Deep Association Metric
N. Wojke
Alex Bewley
Dietrich Paulus
VOT
214
3,407
0
21 Mar 2017
1