Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.08093
Cited By
v1
v2 (latest)
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
17 February 2025
Pingping Zhang
Jinlong Li
Kecheng Chen
Meng Wang
Long Xu
Haoliang Li
Andrii Zadaianchuk
Sam Kwong
Shiqi Wang
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding"
31 / 31 papers shown
Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Lingdong Wang
Guan-Ming Su
D. Kothandaraman
Tsung-Wei Huang
Mohammad Hajiesmaili
R. Sitaraman
DiffM
VGen
239
0
0
29 Nov 2025
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
Ayaan Nooruddin Siddiqui
Mahnoor Zaidi
Ayesha Nazneen Shahbaz
Priyadarshini Chatterjee
Krishnan Menon Iyer
306
0
0
09 Aug 2025
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
Vikram Singh
Kabir Malhotra
Rohan Desai
Ananya Shankaracharya
Priyadarshini Chatterjee
Krishnan Menon Iyer
MedIm
389
0
0
09 Aug 2025
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Aarav Mehta
Priya Deshmukh
Vikram Singh
Siddharth Malhotra
Krishnan Menon Iyer
Tanvi Iyer
MedIm
341
0
0
09 Aug 2025
Deeply Dual Supervised learning for melanoma recognition
Rujosh Polma
Krishnan Menon Iyer
275
0
0
04 Aug 2025
Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
Haoming Ye
Yunxiao Xiao
Cewu Lu
Panpan Cai
LM&Ro
233
0
0
29 Jul 2025
Conditional Video Generation for High-Efficiency Video Compression
Fangqiu Yi
Jingyu Xu
Jiawei Shao
Chi Zhang
Xuelong Li
DiffM
VGen
394
3
0
21 Jul 2025
T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
Zhitao Wang
Hengyu Man
Wenrui Li
Xingtao Wang
Xiaopeng Fan
Debin Zhao
DiffM
VGen
447
3
0
10 Jul 2025
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffM
VGen
434
9
0
25 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Jinlong Li
Cristiano Saltori
Fabio Poiesi
Andrii Zadaianchuk
1.2K
11
0
20 Mar 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
International Conference on Machine Learning (ICML), 2024
Hao Fei
Shengqiong Wu
Wei Ji
Hao Zhang
Hao Fei
Yang Deng
Wynne Hsu
LRM
VGen
581
168
0
08 Jan 2025
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
European Conference on Computer Vision (ECCV), 2024
Vikram S. Voleti
Chun-Han Yao
Mark Boss
Adam Letts
David Pankratz
Dmitry Tochilkin
Christian Laforte
Robin Rombach
Varun Jampani
DiffM
VGen
348
349
0
18 Mar 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
533
570
0
17 Jan 2024
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang
Yifan Zhou
Xudong Xu
Xingang Pan
Bo Dai
DiffM
275
74
0
12 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
1.8K
1,378
0
16 Nov 2023
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
International Conference on Learning Representations (ICLR), 2023
Xinyuan Chen
Yaohui Wang
Lingjun Zhang
Shaobin Zhuang
Xin Ma
Jiashuo Yu
Yali Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
DiffM
428
217
0
31 Oct 2023
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
International Journal of Computer Vision (IJCV), 2023
Yaohui Wang
Xinyuan Chen
Xin Ma
Shangchen Zhou
Ziqi Huang
...
Chen Change Loy
Bo Dai
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
DiffM
334
355
0
26 Sep 2023
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
IEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Chai
Xun Guo
Gaoang Wang
Yang Lu
VGen
DiffM
313
215
0
18 Aug 2023
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Computer Vision and Pattern Recognition (CVPR), 2023
Ouyang Hao
Qiuyu Wang
Yuxi Xiao
Qingyan Bai
Juntao Zhang
Kecheng Zheng
Xiaowei Zhou
Qifeng Chen
Yujun Shen
DiffM
VGen
251
124
0
15 Aug 2023
ModelScope Text-to-Video Technical Report
Jiuniu Wang
Hangjie Yuan
Dayou Chen
Yingya Zhang
Xiang Wang
Shiwei Zhang
VGen
DiffM
447
656
0
12 Aug 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
759
1,636
0
05 Jun 2023
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Songwei Ge
Seungjun Nah
Guilin Liu
Tyler Poon
Andrew Tao
Bryan Catanzaro
David Jacobs
Jia-Bin Huang
Ming-Yuan Liu
Yogesh Balaji
DiffM
VGen
584
313
0
17 May 2023
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
Computer Vision and Pattern Recognition (CVPR), 2023
Zerui Li
Zuo-Liang Zhu
Linghao Han
Qibin Hou
Chunle Guo
Ming-Ming Cheng
250
171
0
19 Apr 2023
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2023
A. Blattmann
Robin Rombach
Huan Ling
Tim Dockhorn
Seung Wook Kim
Sanja Fidler
Karsten Kreis
3DGS
VGen
717
1,575
0
18 Apr 2023
Neural Video Compression with Diverse Contexts
Computer Vision and Pattern Recognition (CVPR), 2023
Jiahao Li
Bin Li
Yan Lu
556
271
0
28 Feb 2023
Structure and Content-Guided Video Synthesis with Diffusion Models
IEEE International Conference on Computer Vision (ICCV), 2023
Patrick Esser
Johnathan Chiu
Parmida Atighehchian
Jonathan Granskog
Anastasis Germanidis
DiffM
VGen
475
717
0
06 Feb 2023
Make-A-Video: Text-to-Video Generation without Text-Video Data
International Conference on Learning Representations (ICLR), 2022
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
395
1,931
0
29 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression
ACM Multimedia (MM), 2021
Jiguo Li
Chuanmin Jia
Xinfeng Zhang
Siwei Ma
Wen Gao
180
29
0
06 Sep 2022
Deep Contextual Video Compression
Jiahao Li
Bin Li
Yan Lu
463
416
0
30 Sep 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
2.2K
45,649
0
26 Feb 2021
Non-local Attention Optimized Deep Image Compression
Haojie Liu
Tong Chen
Peiyao Guo
Qiu Shen
Xun Cao
Yao Wang
Zhan Ma
258
310
0
22 Apr 2019
1
Page 1 of 1