ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.06199
  4. Cited By
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small
  Language Models

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

10 March 2024
Minjie Zhu
Yichen Zhu
Xin Liu
Ning Liu
Zhiyuan Xu
Chaomin Shen
Yaxin Peng
Zhicai Ou
Feifei Feng
Jian Tang
    VLM
ArXivPDFHTML

Papers citing "Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models"

14 / 14 papers shown
Title
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Zirui Song
Jingpu Yang
Yuan Huang
Jonathan Tonglet
Zeyu Zhang
Tao Cheng
Meng Fang
Iryna Gurevych
X. Chen
LRM
50
1
0
19 Feb 2025
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
Muhammad Awais
Ali Husain Salem Abdulla Alharthi
Amandeep Kumar
Hisham Cholakkal
Rao Muhammad Anwer
VLM
52
3
0
10 Jan 2025
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
102
0
0
12 Dec 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned
  Language Models
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
54
95
0
12 Feb 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
14
0
23 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
63
89
0
04 Jan 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
114
367
0
07 Nov 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
94
31
0
18 Sep 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
154
676
0
22 Apr 2021
1