Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.00977
Cited By
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
3 June 2024
Kezhen Chen
Rahul Thapa
Rahul Chalamala
Ben Athiwaratkun
S. Song
James Y. Zou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model"
6 / 6 papers shown
Title
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
130
139
0
10 Jan 2025
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang
Haoxuan You
Philipp Dufter
Bowen Zhang
Chen Chen
...
Tsu-jui Fu
William Yang Wang
Shih-Fu Chang
Zhe Gan
Yinfei Yang
ObjD
MLLM
65
12
0
11 Apr 2024
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
Zheng Yuan
Qiao Jin
Chuanqi Tan
Zhengyun Zhao
Hongyi Yuan
Fei Huang
Songfang Huang
25
10
0
01 Mar 2023
RepsNet: Combining Vision with Language for Automated Medical Reports
A. Tanwani
Joelle Barral
Daniel Freedman
MedIm
20
12
0
27 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
171
608
0
20 Sep 2022
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation
Yasuhide Miura
Yuhao Zhang
Emily Bao Tsai
C. Langlotz
Dan Jurafsky
MedIm
125
126
0
20 Oct 2020
1