ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16058
8
92

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

27 September 2023
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
Shashank Jain
Chun-Fu Yeh
Prakash Murugesan
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
    MLLM
ArXivPDFHTML
Abstract

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.

View on arXiv
Comments on this paper