ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.11813
  4. Cited By
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
v1v2 (latest)

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

21 August 2024
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Yuanxing Zhang
Ke Lin
Jiahao Wang
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
    LRM
ArXiv (abs)PDFHTMLHuggingFace (12 upvotes)

Papers citing "SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs"

9 / 9 papers shown
Title
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
Junyang Zhang
Tianyi Zhu
Thierry Tambe
48
0
0
27 Sep 2025
Towards an Explainable Comparison and Alignment of Feature Embeddings
Towards an Explainable Comparison and Alignment of Feature Embeddings
Mohammad Jalali
Bahar Dibaei Nia
Farzan Farnia
232
3
0
06 Jun 2025
FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL
FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL
Kaihang Pan
Wendong Bu
Y. Wu
Yang Wu
Kai Shen
Yunfei Li
Hang Zhao
Juncheng Billy Li
Siliang Tang
Yueting Zhuang
194
8
0
05 Jun 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
248
1
0
22 May 2025
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and SolutionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Junyi Yuan
Jian Zhang
Fangyu Wu
Dongming Lu
Huanda Lu
Qiufeng Wang
200
3
0
16 May 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Tengjiao Wang
Bin Wang
Wentao Zhang
MLLM
413
1
0
14 Apr 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
308
6
0
25 Mar 2025
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual
  Knowledge
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024
Yaqi Zhao
Yuanyang Yin
Lin Li
Mingan Lin
Victor Shea-Jay Huang
Siwei Chen
Xin Wu
Baoqun Yin
Guosheng Dong
Wentao Zhang
220
3
0
25 Nov 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
236
24
0
25 Apr 2024
1