Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

14 February 2025

Abstract

We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as general image tasks. The architecture of Granite Vision is centered around visual modality alignment with a decoder-only, 2 billion parameter Granite large language model. Additionally, we introduce a dedicated safety classification approach in test-time that leverages a sparse set of attention vectors to identify potential harmful inputs. Despite its lightweight architecture, Granite Vision achieves strong results in standard benchmarks related to visual document understanding, as well as on the LiveXiv benchmark, which is designed to avoid test set contamination by using a constantly updated corpus of recently published Arxiv papers. We are releasing the model under the Apache-2 license, allowing for both research and commercial use, while offering complete visibility into the training data and other relevant details. Seethis https URLfor model weights.

View on arXiv

@article{team2025_2502.09927,
  title={ Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence },
  author={ Granite Vision Team and Leonid Karlinsky and Assaf Arbelle and Abraham Daniels and Ahmed Nassar and Amit Alfassi and Bo Wu and Eli Schwartz and Dhiraj Joshi and Jovana Kondic and Nimrod Shabtay and Pengyuan Li and Roei Herzig and Shafiq Abedin and Shaked Perek and Sivan Harary and Udi Barzelay and Adi Raz Goldfarb and Aude Oliva and Ben Wieles and Bishwaranjan Bhattacharjee and Brandon Huang and Christoph Auer and Dan Gutfreund and David Beymer and David Wood and Hilde Kuehne and Jacob Hansen and Joseph Shtok and Ken Wong and Luis Angel Bathen and Mayank Mishra and Maksym Lysak and Michele Dolfi and Mikhail Yurochkin and Nikolaos Livathinos and Nimrod Harel and Ophir Azulai and Oshri Naparstek and Rafael Teixeira de Lima and Rameswar Panda and Sivan Doveh and Shubham Gupta and Subhro Das and Syed Zawad and Yusik Kim and Zexue He and Alexander Brooks and Gabe Goodhart and Anita Govindjee and Derek Leist and Ibrahim Ibrahim and Aya Soffer and David Cox and Kate Soule and Luis Lastras and Nirmit Desai and Shila Ofek-koifman and Sriram Raghavan and Tanveer Syeda-Mahmood and Peter Staar and Tal Drory and Rogerio Feris },
  journal={arXiv preprint arXiv:2502.09927},
  year={ 2025 }
}

Comments on this paper