ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03430
  4. Cited By
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

7 September 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
ArXivPDFHTML

Papers citing "Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions"

24 / 24 papers shown
Title
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Jay Lee
Hanqi Su
Dai-Yan Ji
Takanobu Minami
AI4CE
46
0
0
03 Apr 2025
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
Wei Dai
Peilin Chen
Malinda Lu
Daniel Li
Haowen Wei
Hejie Cui
Paul Pu Liang
LM&MA
51
1
0
09 Mar 2025
TabulaTime: A Novel Multimodal Deep Learning Framework for Advancing Acute Coronary Syndrome Prediction through Environmental and Clinical Data Integration
TabulaTime: A Novel Multimodal Deep Learning Framework for Advancing Acute Coronary Syndrome Prediction through Environmental and Clinical Data Integration
Xin Zhang
Liangxiu Han
Stephen White
Saad Hassan
Philip A Kalra
James Ritchie
Carl Diver
Jennie Shorley
73
1
0
24 Feb 2025
Modality Interactive Mixture-of-Experts for Fake News Detection
Modality Interactive Mixture-of-Experts for Fake News Detection
Yifan Liu
Y. Liu
Z. Li
Ruichen Yao
Yang Zhang
Dong Wang
MoE
31
0
0
21 Jan 2025
MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version
MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version
Ronghui Xu
Hanyin Cheng
Chenjuan Guo
Hongfan Gao
J. Hu
Sean Bin Yang
Bin Yang
69
4
0
03 Jan 2025
Progressive Compositionality in Text-to-Image Generative Models
Progressive Compositionality in Text-to-Image Generative Models
Xu Han
Linghao Jin
Xiaofeng Liu
Paul Pu Liang
CoGe
93
2
0
22 Oct 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
14
0
30 Jan 2024
MultiIoT: Benchmarking Machine Learning for the Internet of Things
MultiIoT: Benchmarking Machine Learning for the Internet of Things
Shentong Mo
Louis-Philippe Morency
Russ Salakhutdinov
Paul Pu Liang
17
1
0
10 Nov 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
97
31
0
18 Sep 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
A Molecular Multimodal Foundation Model Associating Molecule Graphs with
  Natural Language
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
Bing-Huang Su
Dazhao Du
Zhao-Qing Yang
Yujie Zhou
Jiangmeng Li
Anyi Rao
Haoran Sun
Zhiwu Lu
Ji-Rong Wen
40
107
0
12 Sep 2022
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
159
152
0
07 Aug 2021
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
M. Bronstein
Joan Bruna
Taco S. Cohen
Petar Velivcković
GNN
163
1,095
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
MUFASA: Multimodal Fusion Architecture Search for Electronic Health
  Records
MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records
Zhen Xu
David R. So
Andrew M. Dai
Mamba
48
51
0
03 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,798
0
14 Dec 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
204
607
0
03 Sep 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
232
319
0
21 Aug 2019
Decoding Brain Representations by Multimodal Learning of Neural Activity
  and Visual Features
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features
S. Palazzo
C. Spampinato
I. Kavasidis
D. Giordano
Joseph Schmidt
M. Shah
97
109
0
25 Oct 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
141
1,458
0
06 Jun 2016
1