ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16058
  4. Cited By
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

27 September 2023
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
Shashank Jain
Chun-Fu Yeh
Prakash Murugesan
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
    MLLM
ArXivPDFHTML

Papers citing "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"

50 / 76 papers shown
Title
EgoCHARM: Resource-Efficient Hierarchical Activity Recognition using an Egocentric IMU Sensor
EgoCHARM: Resource-Efficient Hierarchical Activity Recognition using an Egocentric IMU Sensor
Akhil Padmanabha
Saravanan Govindarajan
Hwanmun Kim
Sergio Ortiz
Rahul Rajan
Doruk Senkal
Sneha Kadetotad
25
0
0
24 Apr 2025
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models
Gaurav Verma
Jiawei Zhou
Mohit Chandra
Srijan Kumar
M. D. Choudhury
43
0
0
03 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
29
0
0
29 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
39
0
0
28 Mar 2025
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
Mingzhen Huang
Fu-Jen Chu
Bugra Tekin
Kevin J Liang
Haoyu Ma
...
Hongfei Xue
Siwei Lyu
Kris M. Kitani
Matt Feiszli
Hao Tang
VLM
56
0
0
24 Mar 2025
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
Ege Ozsoy
Chantal Pellegrini
Tobias Czempiel
Felix Tristram
Kun Yuan
D. Bani-Harouni
U. Eck
Benjamin Busam
Matthias Keicher
Nassir Navab
71
1
0
04 Mar 2025
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Xinru Wang
Mengjie Yu
Hannah Nguyen
Michael Iuzzolino
Tianyi Wang
...
Ting Zhang
Naveen Sendhilnathan
Hrvoje Benko
Haijun Xia
Tanya R. Jonker
43
0
0
26 Feb 2025
Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
Ziwei Shan
Yaoyu He
Chengfeng Zhao
Jiashen Du
Jingyan Zhang
Qixuan Zhang
Jingyi Yu
Lan Xu
46
1
0
22 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
56
8
0
21 Feb 2025
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
Gang Liu
Jinlong He
Pengfei Li
Genrong He
Zixu Zhao
Shenjun Zhong
LM&MA
61
2
0
17 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
96
102
0
10 Jan 2025
Altogether: Image Captioning via Re-aligning Alt-text
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
...
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
26
6
0
31 Dec 2024
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large
  Language Models
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Shanqing Yu
Qi Xuan
66
0
0
22 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past,
  Present and Future
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
J. Liu
N. Shah
Ping Chen
68
2
0
18 Dec 2024
MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension
MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension
Zeyu Ling
Bo Han
Shiyang Li
H. Shen
Jikang Cheng
Changqing Zou
79
1
0
26 Nov 2024
PaPaGei: Open Foundation Models for Optical Physiological Signals
PaPaGei: Open Foundation Models for Optical Physiological Signals
Arvind Pillai
Dimitris Spathis
F. Kawsar
Mohammad Malekzadeh
VLM
29
7
0
27 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
16
2
0
12 Oct 2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large
  Language Models
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Yiming Chen
Xianghu Yue
Xiaoxue Gao
Chen Zhang
L. F. D’Haro
R. Tan
Haizhou Li
AuLLM
20
0
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
35
11
0
26 Sep 2024
Affective Computing Has Changed: The Foundation Model Disruption
Affective Computing Has Changed: The Foundation Model Disruption
Björn Schuller
Adria Mallol-Ragolta
Alejandro Pena Almansa
Iosif Tsangko
Mostafa M. Amin
A. Semertzidou
Lukas Christ
Shahin Amiriparian
24
0
0
13 Sep 2024
Doppelgänger's Watch: A Split Objective Approach to Large Language
  Models
Doppelgänger's Watch: A Split Objective Approach to Large Language Models
S. Ghasemlou
Ashish Katiyar
Aparajita Saraf
Seungwhan Moon
Mangesh Pujari
Pinar E. Donmez
Babak Damavandi
Anuj Kumar
23
0
0
09 Sep 2024
Limitations in Employing Natural Language Supervision for Sensor-Based
  Human Activity Recognition -- And Ways to Overcome Them
Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them
H. Haresamudram
Apoorva Beedu
Mashfiqui Rabbi
Sankalita Saha
Irfan Essa
Thomas Ploetz
21
1
0
21 Aug 2024
Towards Holistic Disease Risk Prediction using Small Language Models
Towards Holistic Disease Risk Prediction using Small Language Models
Liv Bjorkdahl
Oskar Pauli
Johan Ostman
Chiara Ceccobello
Sara Lundell
Magnus Kjellberg
LM&MA
21
0
0
13 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
32
1
0
04 Aug 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal
  Large Language Model
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLM
VLM
39
1
0
23 Jul 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
32
47
0
17 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
30
1
0
13 Jun 2024
Large Language Models Memorize Sensor Datasets! Implications on Human
  Activity Recognition Research
Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research
H. Haresamudram
Hrudhai Rajasekhar
Nikhil Murlidhar Shanbhogue
Thomas Ploetz
16
1
0
09 Jun 2024
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large
  Language Model
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang
Shiyu Xuan
Shiliang Zhang
LRM
24
3
0
07 Jun 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge
  Bases
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
22
2
0
30 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
41
13
0
28 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
29
7
0
23 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
14
27
0
18 May 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
19
5
0
24 Apr 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
47
12
0
20 Mar 2024
MEDBind: Unifying Language and Multimodal Medical Data Embeddings
MEDBind: Unifying Language and Multimodal Medical Data Embeddings
Yuan Gao
Sangwook Kim
David E Austin
Chris McIntosh
21
2
0
19 Mar 2024
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language
  Models for Report Generation
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan
Che Liu
Xin Wang
Chaofan Tao
Hui Shen
Zhenwu Peng
Jie Fu
Rossella Arcucci
Huaxiu Yao
Mi Zhang
34
1
0
07 Mar 2024
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal
  Datasets
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Hossein Aboutalebi
Hwanjun Song
Yusheng Xie
Arshit Gupta
Justin Sun
Hang Su
Igor Shalyminov
Nikolaos Pappas
Siffi Singh
Saab Mansour
DiffM
EGVM
28
4
0
05 Mar 2024
Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual
  Capabilities Without Richer Cross-Modal Projections
Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual Capabilities Without Richer Cross-Modal Projections
Gaurav Verma
Minje Choi
Kartik Sharma
J. Watson-Daniels
Sejoon Oh
Srijan Kumar
MLLM
VLM
19
3
0
26 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
16
32
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
29
41
0
19 Feb 2024
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong
  Vision-language Adapter
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Junfei Xiao
Zheng Xu
Alan L. Yuille
Shen Yan
Boyu Wang
14
2
0
16 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
45
4
0
08 Feb 2024
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Nate Gruver
Anuroop Sriram
Andrea Madotto
A. Wilson
C. L. Zitnick
Zachary W. Ulissi
7
53
0
06 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Wei Ping
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
59
73
0
02 Feb 2024
Large Language Models for Time Series: A Survey
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
66
53
0
02 Feb 2024
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
  AI for Smart Eyewear
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear
Robert Konrad
Nitish Padmanaban
J. G. Buckmaster
Kevin C. Boyle
Gordon Wetzstein
12
11
0
30 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
34
173
0
24 Jan 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
  3D World
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong
Zishuo Zheng
Peihao Chen
Yian Wang
Junyan Li
Chuang Gan
8
17
0
16 Jan 2024
When Large Language Model Agents Meet 6G Networks: Perception,
  Grounding, and Alignment
When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment
Minrui Xu
Dusit Niyato
Jiawen Kang
Zehui Xiong
Shiwen Mao
Zhu Han
Dong In Kim
K. B. Letaief
LLMAG
23
30
0
15 Jan 2024
12
Next