ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16058
  4. Cited By
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

27 September 2023
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
Shashank Jain
Chun-Fu Yeh
Prakash Murugesan
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
    MLLM
ArXivPDFHTML

Papers citing "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"

26 / 76 papers shown
Title
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident
  Analysis
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis
Kebin Wu
Wenbin Li
Xiaofei Xiao
6
2
0
05 Jan 2024
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
21
18
0
30 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
25
22
0
27 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
27
1
0
15 Dec 2023
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs
  for Embodied AI
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI
Kai Huang
Boyuan Yang
Wei Gao
10
1
0
13 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data
  Generator
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
33
7
0
11 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
22
45
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
22
1
0
30 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
19
19
0
22 Nov 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
46
14
0
20 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
14
51
0
20 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
22
12
0
14 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
21
10
0
01 Nov 2023
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large
  Language Models
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
Eren Unlu
Unver Ciftci
20
0
0
27 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
123
20
0
26 Oct 2023
LLark: A Multimodal Instruction-Following Language Model for Music
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner
Simon Durand
Daniel Stoller
Rachel M. Bittner
AuLLM
15
14
0
11 Oct 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
13
8
0
26 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
39
17
0
14 Sep 2023
The first step is the hardest: Pitfalls of Representing and Tokenizing
  Temporal Data for Large Language Models
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Dimitris Spathis
F. Kawsar
AI4TS
13
17
0
12 Sep 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
19
126
0
11 Jul 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
16
515
0
23 Jun 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
26
6
0
13 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
Previous
12