ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.14159
  4. Cited By
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

25 January 2024
Tianhe Ren
Shilong Liu
Ailing Zeng
Jing Lin
Kunchang Li
He Cao
Jiayu Chen
Xinyu Huang
Yukang Chen
Feng Yan
Zhaoyang Zeng
Hao Zhang
Feng Li
Jie-jin Yang
Hongyang Li
Qing Jiang
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks"

50 / 83 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
39
0
0
13 May 2025
FoodTrack: Estimating Handheld Food Portions with Egocentric Video
FoodTrack: Estimating Handheld Food Portions with Egocentric Video
Ervin Wang
Yuhao Chen
EgoV
46
0
0
07 May 2025
CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion
CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion
Y. Li
Pencheng Wan
Liang Han
Yaowei Wang
Liqiang Nie
Min Zhang
36
0
0
07 May 2025
Estimating the Diameter at Breast Height of Trees in a Forest With a Single 360 Camera
Estimating the Diameter at Breast Height of Trees in a Forest With a Single 360 Camera
Siming He
Zachary Osman
Fernando Cladera
Dexter Ong
Nitant Rai
Patrick Corey Green
Vijay R. Kumar
Pratik Chaudhari
30
0
0
06 May 2025
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Lu Ling
C. Lin
Tsung-Yi Lin
Yifan Ding
Y. Zeng
Yichen Sheng
Yunhao Ge
Ming-Yu Liu
Aniket Bera
Zhaoshuo Li
VGen
3DV
45
0
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
20
0
0
30 Apr 2025
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
Marc Glocker
Peter Honig
Matthias Hirschmanner
Markus Vincze
LM&Ro
75
0
0
30 Apr 2025
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Q. Yang
Yuan Yao
Miaomiao Cui
Liefeng Bo
VLM
54
0
0
30 Apr 2025
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
75
0
0
30 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Z. Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Mohit Bansal
Huaxiu Yao
54
0
0
27 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lei Ren
Huixing Jiang
Chen Wei
Xiaojie Wang
Ruifan Li
Fangxiang Feng
DiffM
69
0
0
22 Apr 2025
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
Zetong Zhang
Manuel Kaufmann
Lixin Xue
Jie Song
Martin R. Oswald
3DH
62
0
0
17 Apr 2025
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Kooshan Amini
Yuhao Liu
Jamie Ellen Padgett
Guha Balakrishnan
Ashok Veeraraghavan
26
0
0
17 Apr 2025
Continuous Locomotive Crowd Behavior Generation
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae
Junoh Lee
Hae-Gon Jeon
31
0
0
07 Apr 2025
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang
Yixiao Yang
Han Huang
Liang Han
Kanle Shi
Yu-Shen Liu
Zhizhong Han
MDE
53
3
0
24 Mar 2025
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
Zeqi Gu
Difan Liu
Timothy Langlois
Matthew Fisher
Abe Davis
DiffM
3DH
60
0
0
19 Mar 2025
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
L. Yang
Kaixin Zhu
Juanxi Tian
Bohan Zeng
M. Lin
Hongjuan Pei
Wentao Zhang
Shuicheng Yan
VGen
73
0
0
17 Mar 2025
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Xinyu Zhang
Haonan Chang
Yuhan Liu
Abdeslam Boularias
3DGS
39
0
0
12 Mar 2025
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
Yufei Wang
Ziyu Wang
Mino Nakura
Pratik Bhowal
Chia-Liang Kuo
Yi-Ting Chen
Zackory M. Erickson
David Held
56
0
0
04 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
67
1
0
04 Mar 2025
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu
Haoyu Liu
Hongming Fu
Yichuan Peng
Jinyuan Liu
Xin-Yue Fan
Risheng Liu
63
0
0
03 Mar 2025
Solving Instance Detection from an Open-World Perspective
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
32
0
0
01 Mar 2025
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
Chao Tang
Anxing Xiao
Yuhong Deng
Tianrun Hu
Wenlong Dong
Hanbo Zhang
David Hsu
Hong Zhang
71
2
0
24 Feb 2025
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti
Roberto Bigazzi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
83
1
0
20 Feb 2025
SMITE: Segment Me In TimE
SMITE: Segment Me In TimE
Amirhossein Alimohammadi
Sauradip Nag
Saeid Asgari Taghanaki
Andrea Tagliasacchi
Ghassan Hamarneh
Ali Mahdavi-Amiri
VLM
VOS
77
2
0
20 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
22
2
0
18 Feb 2025
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Yun Peng
Xiao Lin
Nachuan Ma
Jiayuan Du
Chuangwei Liu
Chengju Liu
Qi Chen
37
3
0
17 Feb 2025
Deciphering Functions of Neurons in Vision-Language Models
Deciphering Functions of Neurons in Vision-Language Models
Jiaqi Xu
Cuiling Lan
Xuejin Chen
Yan Lu
VLM
72
0
0
10 Feb 2025
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
A. H. Tan
Angus Fung
Haitong Wang
G. Nejat
76
1
0
31 Jan 2025
MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies
Long Yang
Lianqing Zheng
W. Ai
Minghao Liu
Sen Li
Qunshu Lin
Shengyu Yan
Jie Bai
Zhixiong Ma
Xichan Zhu
46
0
0
28 Jan 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
41
0
0
16 Jan 2025
Guided SAM: Label-Efficient Part Segmentation
Guided SAM: Label-Efficient Part Segmentation
S.B. van Rooij
G.J. Burghouts
VLM
33
0
0
13 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
55
1
0
03 Jan 2025
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
W. Liu
X. Wang
3DGS
ViT
99
5
0
17 Dec 2024
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng
Yu Han
Xijing Zhang
Tanghui Li
Yanting Zhang
Rui Fan
95
3
0
15 Dec 2024
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Vinayak Gupta
Yunze Man
Yu-Xiong Wang
VGen
81
0
0
05 Dec 2024
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
Tianxing Chen
Yao Mu
Zhixuan Liang
Z. Chen
Shijia Peng
...
Mingkun Xu
R. Hu
H. Zhang
Xuelong Li
Ping Luo
AI4CE
95
8
0
27 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
99
1
0
26 Nov 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
W. Zhang
Wentao Zhang
76
5
0
18 Nov 2024
SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation
SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation
Xun Tu
Karthik Desingh
66
2
0
07 Nov 2024
Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
George Jiayuan Gao
Tianyu Li
Nadia Figueroa
36
0
0
05 Nov 2024
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang
Zizhang Li
M. Zhou
Shangzhe Wu
Jiajun Wu
33
4
0
22 Oct 2024
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
Wendi Chen
Han Xue
Fangyuan Zhou
Yuan Fang
Cewu Lu
36
0
0
15 Oct 2024
FusionSense: Bridging Common Sense, Vision, and Touch for Robust
  Sparse-View Reconstruction
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction
Irving Fang
Kairui Shi
X. He
Siqi Tan
Yifan Wang
Hanwen Zhao
Hung-Jui Huang
Wenzhen Yuan
Chen Feng
Jing Zhang
3DGS
49
1
0
10 Oct 2024
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of
  Consistency and Progress
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia
Rohan Sinha
Jingyun Yang
Zi-ang Cao
Rika Antonova
Marco Pavone
Jeannette Bohg
26
6
0
06 Oct 2024
Replace Anyone in Videos
Replace Anyone in Videos
Xiang Wang
Shiwei Zhang
Haonan Qiu
Ruihang Chu
Zekun Li
Y. Zhang
Changxin Gao
Yuehuan Wang
Chunhua Shen
Nong Sang
VGen
DiffM
58
1
0
30 Sep 2024
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Minseo Kwon
Yaesol Kim
Young J. Kim
16
3
0
28 Sep 2024
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić
Sai Kumar Dwivedi
Shashank Tripathi
Theo Gevers
Dimitrios Tzionas
Dimitrios Tzionas
42
2
0
24 Sep 2024
PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images
PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images
Nanqing Liu
Xun Xu
Yongyi Su
Haojie Zhang
Heng-Chao Li
VLM
27
14
0
20 Sep 2024
12
Next