ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.00849
  4. Cited By
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
  Pre-Training

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

1 January 2024
Alex Jinpeng Wang
Linjie Li
Kevin Qinghong Lin
Jianfeng Wang
Kevin Lin
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
    VLMVGen
ArXiv (abs)PDFHTMLHuggingFace (17 upvotes)Github

Papers citing "COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training"

12 / 12 papers shown
Cambrian-S: Towards Spatial Supersensing in Video
Cambrian-S: Towards Spatial Supersensing in Video
Shusheng Yang
J. Yang
Pinzhi Huang
Ellis L Brown
Zihao Yang
...
Daohan Lu
Rob Fergus
Yann LeCun
Li Fei-Fei
Saining Xie
231
51
0
06 Nov 2025
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei
Yifei Huang
Jilan Xu
Yuping He
Guo Chen
Fei Wu
Yu Qiao
Jiangmiao Pang
EgoVLRM
288
11
0
27 Oct 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningInternational Conference on Learning Representations (ICLR), 2025
Baoqi Pei
Yuanmin Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Leilei Gan
Limin Wang
321
15
0
02 Mar 2025
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Gengyuan Zhang
Mingcong Ding
Tong Liu
Yao Zhang
Volker Tresp
499
2
0
24 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
326
28
0
17 Feb 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
1.0K
152
0
31 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?The Web Conference (WWW), 2024
Xi Ding
Lei Wang
1.0K
14
0
18 Dec 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
292
15
0
31 Jul 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
336
14
0
04 Jun 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Xingtai Lv
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
175
11
0
17 Apr 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Contextual AD Narration with Interleaved Multimodal SequenceComputer Vision and Pattern Recognition (CVPR), 2024
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
566
11
0
19 Mar 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
923
225
0
29 Dec 2023
1
Page 1 of 1