Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.04923
Cited By
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
6 July 2024
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding"
6 / 6 papers shown
Title
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
54
0
0
20 Apr 2025
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
56
34
0
29 Apr 2024
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Hugo Laurençon
Léo Tronchon
Victor Sanh
VLM
47
13
0
14 Mar 2024
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
153
152
0
07 Aug 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
1