Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content - without large-scale internet data - results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.
View on arXiv@article{alyakin2025_2502.19546, title={ Repurposing the scientific literature with vision-language models }, author={ Anton Alyakin and Jaden Stryker and Daniel Alexander Alber and Karl L. Sangwon and Jin Vivian Lee and Brandon Duderstadt and Akshay Save and David Kurland and Spencer Frome and Shrutika Singh and Jeff Zhang and Eunice Yang and Ki Yun Park and Cordelia Orillac and Aly A. Valliani and Sean Neifert and Albert Liu and Aneek Patel and Christopher Livia and Darryl Lau and Ilya Laufer and Peter A. Rozman and Eveline Teresa Hidalgo and Howard Riina and Rui Feng and Todd Hollon and Yindalon Aphinyanaphongs and John G. Golfinos and Laura Snyder and Eric Leuthardt and Douglas Kondziolka and Eric Karl Oermann }, journal={arXiv preprint arXiv:2502.19546}, year={ 2025 } }