An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved 82.2% accuracy in disease detection, 89% in severity differentiation, 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an ophthalmologist. This system offers enhanced usability and diagnostic performance, making it a valuable decision support tool for primary eye care or an online LLM for fundus evaluation.
View on arXiv@article{soh2025_2505.08414, title={ An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care }, author={ Zhi Da Soh and Yang Bai and Kai Yu and Yang Zhou and Xiaofeng Lei and Sahil Thakur and Zann Lee and Lee Ching Linette Phang and Qingsheng Peng and Can Can Xue and Rachel Shujuan Chong and Quan V. Hoang and Lavanya Raghavan and Yih Chung Tham and Charumathi Sabanayagam and Wei-Chi Wu and Ming-Chih Ho and Jiangnan He and Preeti Gupta and Ecosse Lamoureux and Seang Mei Saw and Vinay Nangia and Songhomitra Panda-Jonas and Jie Xu and Ya Xing Wang and Xinxing Xu and Jost B. Jonas and Tien Yin Wong and Rick Siow Mong Goh and Yong Liu and Ching-Yu Cheng }, journal={arXiv preprint arXiv:2505.08414}, year={ 2025 } }