ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.01926
  4. Cited By
Auto-Encoding Morph-Tokens for Multimodal LLM

Auto-Encoding Morph-Tokens for Multimodal LLM

3 May 2024
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
    MLLM
ArXivPDFHTML

Papers citing "Auto-Encoding Morph-Tokens for Multimodal LLM"

18 / 18 papers shown
Title
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
42
0
0
22 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
32
1
0
20 Apr 2025
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
Jinbin Bai
Wei Chow
L. Yang
Xiangtai Li
Juncheng Billy Li
H. Zhang
Shuicheng Yan
101
3
0
05 Dec 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
H. Zhang
Yueting Zhuang
DiffM
95
15
0
24 Nov 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
41
6
0
14 Oct 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Jinbin Bai
Tian-Chun Ye
Wei Chow
Enxin Song
Qing-Guo Chen
Xiangtai Li
Zhen Dong
Lei Zhu
46
13
0
10 Oct 2024
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Kaihang Pan
Zhaoyu Fan
Juncheng Li
Qifan Yu
Hao Fei
Siliang Tang
Richang Hong
Hanwang Zhang
Qianru Sun
KELM
26
6
0
30 Sep 2024
Visual Prompting in Multimodal Large Language Models: A Survey
Visual Prompting in Multimodal Large Language Models: A Survey
Junda Wu
Zhehao Zhang
Yu Xia
Xintong Li
Zhaoyang Xia
...
Subrata Mitra
Dimitris N. Metaxas
Lina Yao
Jingbo Shang
Julian McAuley
VLM
LRM
38
12
0
05 Sep 2024
GPT-4o: Visual perception performance of multimodal large language
  models in piglet activity understanding
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
Yiqi Wu
Xiaodan Hu
Ziming Fu
Siling Zhou
Jiangong Li
MLLM
16
9
0
14 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
55
25
0
07 Jun 2024
WorldGPT: Empowering LLM as Multimodal World Model
WorldGPT: Empowering LLM as Multimodal World Model
Zhiqi Ge
Hongzhe Huang
Mingze Zhou
Juncheng Li
Guoming Wang
Siliang Tang
Yueting Zhuang
27
26
0
28 Apr 2024
Momentor: Advancing Video Large Language Model with Fine-Grained
  Temporal Reasoning
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
57
9
0
18 Feb 2024
I3: Intent-Introspective Retrieval Conditioned on Instructions
I3: Intent-Introspective Retrieval Conditioned on Instructions
Kaihang Pan
Juncheng Li
Wenjie Wang
Hao Fei
Hongye Song
Wei Ji
Jun Lin
Xiaozhong Liu
Tat-Seng Chua
Siliang Tang
28
5
0
19 Aug 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
99
235
0
16 Jun 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
27
44
0
25 Mar 2023
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal
  Guidance
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Wei Li
Xue Xu
Xinyan Xiao
Jiacheng Liu
Hu Yang
...
Zhanpeng Wang
Zhifan Feng
Qiaoqiao She
Yajuan Lyu
Hua-Hong Wu
110
29
0
28 Oct 2022
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
1