ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.00692
  4. Cited By
LISA: Reasoning Segmentation via Large Language Model

LISA: Reasoning Segmentation via Large Language Model

1 August 2023
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
    LM&Ro
    VLM
    MLLM
    LRM
ArXivPDFHTML

Papers citing "LISA: Reasoning Segmentation via Large Language Model"

50 / 72 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
30
0
0
13 May 2025
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem
Filippo Aleotti
Jamie Watson
Z. Qureshi
Abdelrahman Eldesokey
Peter Wonka
Gabriel J. Brostow
Sara Vicente
Guillermo Garcia-Hernando
DiffM
47
0
0
08 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
44
0
0
03 May 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
LM&MA
MedIm
68
0
0
30 Apr 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Y. Chen
H. Li
Xiaoshen Han
Z. Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
75
0
0
30 Apr 2025
DreamO: A Unified Framework for Image Customization
DreamO: A Unified Framework for Image Customization
Chong Mou
Yanze Wu
Wenxu Wu
Zinan Guo
Pengze Zhang
...
Shaojin Wu
S. Zhao
Jian Andrew Zhang
Qian He
Xinglong Wu
44
0
0
23 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
47
0
0
22 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
X. Li
Tao Zhang
Lu Qi
Ming Yang
25
0
0
15 Apr 2025
BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts
BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts
Suzhe Xu
Jialin Peng
Chengyuan Zhang
VLM
46
0
0
25 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
43
0
0
12 Mar 2025
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger
Snehal Jauhri
V. Prasad
Georgia Chalvatzaki
58
0
0
12 Mar 2025
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou
Manli Tao
Chaoyang Zhao
Haiyun Guo
Honghui Dong
Ming Tang
J. T. Wang
46
0
0
11 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
58
3
0
10 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
64
1
0
04 Mar 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
64
8
0
21 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
64
19
0
21 Jan 2025
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
Yuzhou Huang
Ziyang Yuan
Quande Liu
Qiulin Wang
Xintao Wang
Ruimao Zhang
Pengfei Wan
Di Zhang
Kun Gai
VGen
DiffM
35
10
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
X. Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
79
11
0
07 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
88
45
0
03 Jan 2025
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis
Shengxuming Zhang
Weihan Li
Tianhong Gao
Jiacong Hu
Haoming Luo
Mingli Song
Xiuming Zhang
Mingli Song
Zunlei Feng
LM&MA
101
0
0
12 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
79
1
0
02 Dec 2024
ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
Kunyang Han
Yibo Hu
Mengxue Qu
Hailin Shi
Yao Zhao
Y. X. Wei
MLLM
VLM
3DV
77
1
0
29 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
92
6
0
27 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
96
1
0
26 Nov 2024
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
Y. Zhou
Mengcheng Lan
Xiang Li
Yiping Ke
Xue Jiang
Litong Feng
Qingyun Li
Xue Yang
Wayne Zhang
ObjD
VLM
107
4
0
16 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Saeed Mian
Mohit Bansal
Chen Chen
LRM
46
1
0
15 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
F. Khan
Salman Khan
MLLM
VGen
VLM
42
6
0
07 Nov 2024
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
37
11
0
22 Oct 2024
UnSeg: One Universal Unlearnable Example Generator is Enough against All
  Image Segmentation
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation
Ye Sun
Hao Zhang
Tiehua Zhang
Xingjun Ma
Yu-Gang Jiang
VLM
29
3
0
13 Oct 2024
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Qingwen Bu
Hongyang Li
Li Chen
Jisong Cai
Jia Zeng
Heming Cui
Maoqing Yao
Yu Qiao
34
2
0
10 Oct 2024
From Generalist to Specialist: Adapting Vision Language Models via
  Task-Specific Visual Instruction Tuning
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
41
0
0
09 Oct 2024
Visual-O1: Understanding Ambiguous Instructions via Multi-modal
  Multi-turn Chain-of-thoughts Reasoning
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni
Yutao Fan
Lei Zhang
Wangmeng Zuo
LRM
AI4CE
15
6
0
04 Oct 2024
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu
Xuanyu Zhang
Runyi Li
Zecheng Tang
Qing Huang
Jian Andrew Zhang
AAML
37
14
0
03 Oct 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Qiaojun Yu
Siyuan Huang
Xibin Yuan
Zhengkai Jiang
Ce Hao
...
Junbo Wang
Liu Liu
Hongsheng Li
Peng Gao
Cewu Lu
57
3
0
30 Sep 2024
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Weiliang Tang
Jia-Hui Pan
Wei Zhan
Jianshu Zhou
Huaxiu Yao
Yun-Hui Liu
M. Tomizuka
Mingyu Ding
Chi-Wing Fu
41
0
0
16 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
67
15
0
05 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
30
10
0
01 Sep 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
M. Zhang
DiffM
45
1
0
16 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Haozhao Wang
Zhicheng Chen
Peilin Zhao
VLM
MLLM
55
18
0
04 Aug 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
45
2
0
18 Jul 2024
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
Yuqi Jiang
Xudong Lu
Qian Jin
Qi Sun
Hanming Wu
Cheng Zhuo
28
3
0
15 Jul 2024
PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse
  Potential Outcomes Using Large Vision and Language Models
PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models
Can Cui
Ruining Deng
Junlin Guo
Quan Liu
Tianyuan Yao
Haichun Yang
Yuankai Huo
VLM
MedIm
29
1
0
13 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
106
13
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
51
23
0
28 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
64
12
0
09 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
61
11
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
55
25
0
07 Jun 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
53
9
0
27 May 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Ajmal Saeed Mian
24
2
0
21 May 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
34
82
0
08 Apr 2024
12
Next