ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.04001
  4. Cited By
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

7 January 2025
Haobo Yuan
X. Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
    VLM
ArXivPDFHTML

Papers citing "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

10 / 10 papers shown
Title
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
LM&MA
MedIm
66
0
0
30 Apr 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Mengshi Qi
Pengfei Zhu
X. Li
Xiaoyang Bi
Lu Qi
Huadong Ma
Ming Yang
VOS
VLM
37
0
0
16 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
X. Li
Tao Zhang
Lu Qi
Ming Yang
21
0
0
15 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
X. Li
Zilong Huang
Y. Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
43
1
0
14 Apr 2025
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Kaiyu Li
Zepeng Xin
Li Pang
Chao Pang
Yupeng Deng
Jing Yao
Guisong Xia
Deyu Meng
Zhi Wang
Xiangyong Cao
VLM
LRM
37
0
0
13 Apr 2025
URECA: Unique Region Caption Anything
URECA: Unique Region Caption Anything
Sangbeom Lim
J. Kim
Heeji Yoon
Jaewoo Jung
Seungryong Kim
26
0
0
07 Apr 2025
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
Hao Fang
Runmin Cong
Xiankai Lu
Z. Chen
Wei Zhang
29
0
0
07 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
26
0
0
07 Apr 2025
4th PVUW MeViS 3rd Place Report: Sa2VA
4th PVUW MeViS 3rd Place Report: Sa2VA
Haobo Yuan
Tao Zhang
X. Li
Lu Qi
Zilong Huang
Shilin Xu
Jiashi Feng
Ming Yang
33
1
0
01 Apr 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
64
2
0
14 Mar 2025
1