Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.17240
Cited By
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
28 December 2023
Senqiao Yang
Tianyuan Qu
Xin Lai
Zhuotao Tian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model"
28 / 28 papers shown
Title
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li
Jinglin Xu
Yunzhen Zhao
Yuxin Peng
ObjD
22
0
0
21 Apr 2025
Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins
Yiqing Shen
Chenjia Li
Bohan Liu
Cheng-Yi Li
Tito Porras
Mathias Unberath
46
2
0
26 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
64
0
0
17 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Y. Liu
M. Yang
Chunhua Shen
MLLM
VLM
69
0
0
11 Mar 2025
Large Language Model for Lossless Image Compression with Visual Prompts
Junhao Du
Chuqin Zhou
Ning Cao
Gang Chen
Yunuo Chen
Zhengxue Cheng
Li-Na Song
Guo Lu
Wenjun Zhang
VLM
39
1
0
22 Feb 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Z. Yang
Pingping Zhang
Huchuan Lu
31
0
0
15 Jan 2025
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
Chen Bao
Jiarui Xu
Xiaolong Wang
Abhinav Gupta
Homanga Bharadhwaj
68
2
0
17 Dec 2024
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu
Jie Kong
Jiazhi Wang
Xiao Pan
Bo Lin
Qiang Liu
DiffM
72
1
0
26 Nov 2024
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey
Kien X. Nguyen
Fengchun Qiao
Arthur Trembanis
Xi Peng
21
0
0
31 Oct 2024
LocateBench: Evaluating the Locating Ability of Vision Language Models
Ting-Rui Chiang
Joshua Robinson
Xinyan Velocity Yu
Dani Yogatama
VLM
ELM
34
0
0
17 Oct 2024
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
Lingxiao Luo
Bingda Tang
Xuanzhong Chen
Rong Han
Ting Chen
VLM
16
1
0
16 Oct 2024
Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension
Zaiquan Yang
Yuhao Liu
Jiaying Lin
Gerhard Hancke
Rynson W. H. Lau
21
1
0
02 Oct 2024
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
36
4
0
23 Aug 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
40
2
0
18 Jul 2024
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao
Zhuotao Tian
Hang Zhao
Jingyong Su
VLM
21
14
0
11 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
44
23
0
28 Jun 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
47
44
0
27 Jun 2024
RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model
Hantao Zhou
Tianying Ji
Lukas Sommerhalder
Michael Goerner
Norman Hendrich
Jianwei Zhang
Fuchun Sun
Huazhe Xu
41
0
0
14 Jun 2024
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing
Xinyu Zhang
Mengxue Kang
Fei Wei
Shuang Xu
Yuhe Liu
Lin Ma
MLLM
DiffM
18
2
0
27 May 2024
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang
Zhuotao Tian
Li Jiang
Jiaya Jia
21
7
0
10 Apr 2024
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
Zheng-Wei Zhang
Yeyao Ma
Enming Zhang
Xiang Bai
VLM
MLLM
14
29
0
21 Mar 2024
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng
Xiaoyang Wu
Li Jiang
Yukang Chen
Hengshuang Zhao
Zhuotao Tian
Jiaya Jia
45
16
0
21 Mar 2024
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
22
10
0
19 Dec 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
41
13
0
24 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
PFENet++: Boosting Few-shot Semantic Segmentation with the Noise-filtered Context-aware Prior Mask
Xiaoliu Luo
Zhuotao Tian
Taiping Zhang
Bei Yu
Yuan Yan Tang
Jiaya Jia
39
37
0
28 Sep 2021
1