ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08093
  4. Cited By
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding
v1v2 (latest)

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

17 February 2025
Pingping Zhang
Jinlong Li
Kecheng Chen
Meng Wang
Long Xu
Haoliang Li
Andrii Zadaianchuk
Sam Kwong
Shiqi Wang
    VGen
ArXiv (abs)PDFHTMLGithub

Papers citing "When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding"

31 / 31 papers shown
Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Lingdong Wang
Guan-Ming Su
D. Kothandaraman
Tsung-Wei Huang
Mohammad Hajiesmaili
R. Sitaraman
DiffMVGen
239
0
0
29 Nov 2025
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
Ayaan Nooruddin Siddiqui
Mahnoor Zaidi
Ayesha Nazneen Shahbaz
Priyadarshini Chatterjee
Krishnan Menon Iyer
306
0
0
09 Aug 2025
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
Vikram Singh
Kabir Malhotra
Rohan Desai
Ananya Shankaracharya
Priyadarshini Chatterjee
Krishnan Menon Iyer
MedIm
389
0
0
09 Aug 2025
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Aarav Mehta
Priya Deshmukh
Vikram Singh
Siddharth Malhotra
Krishnan Menon Iyer
Tanvi Iyer
MedIm
341
0
0
09 Aug 2025
Deeply Dual Supervised learning for melanoma recognition
Deeply Dual Supervised learning for melanoma recognition
Rujosh Polma
Krishnan Menon Iyer
275
0
0
04 Aug 2025
Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
Haoming Ye
Yunxiao Xiao
Cewu Lu
Panpan Cai
LM&Ro
233
0
0
29 Jul 2025
Conditional Video Generation for High-Efficiency Video Compression
Conditional Video Generation for High-Efficiency Video Compression
Fangqiu Yi
Jingyu Xu
Jiawei Shao
Chi Zhang
Xuelong Li
DiffMVGen
394
3
0
21 Jul 2025
T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
Zhitao Wang
Hengyu Man
Wenrui Li
Xingtao Wang
Xiaopeng Fan
Debin Zhao
DiffMVGen
447
3
0
10 Jul 2025
GIViC: Generative Implicit Video Compression
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffMVGen
434
9
0
25 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Jinlong Li
Cristiano Saltori
Fabio Poiesi
Andrii Zadaianchuk
1.2K
11
0
20 Mar 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Video-of-Thought: Step-by-Step Video Reasoning from Perception to CognitionInternational Conference on Machine Learning (ICML), 2024
Hao Fei
Shengqiong Wu
Wei Ji
Hao Zhang
Hao Fei
Yang Deng
Wynne Hsu
LRMVGen
581
168
0
08 Jan 2025
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image
  using Latent Video Diffusion
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video DiffusionEuropean Conference on Computer Vision (ECCV), 2024
Vikram S. Voleti
Chun-Han Yao
Mark Boss
Adam Letts
David Pankratz
Dmitry Tochilkin
Christian Laforte
Robin Rombach
Varun Jampani
DiffMVGen
348
349
0
18 Mar 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGenDiffM
533
570
0
17 Jan 2024
DiffMorpher: Unleashing the Capability of Diffusion Models for Image
  Morphing
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang
Yifan Zhou
Xudong Xu
Xingang Pan
Bo Dai
DiffM
275
74
0
12 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLMMLLM
1.8K
1,378
0
16 Nov 2023
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
  Prediction
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and PredictionInternational Conference on Learning Representations (ICLR), 2023
Xinyuan Chen
Yaohui Wang
Lingjun Zhang
Shaobin Zhuang
Xin Ma
Jiashuo Yu
Yali Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGenDiffM
428
217
0
31 Oct 2023
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
  Models
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion ModelsInternational Journal of Computer Vision (IJCV), 2023
Yaohui Wang
Xinyuan Chen
Xin Ma
Shangchen Zhou
Ziqi Huang
...
Chen Change Loy
Bo Dai
Dahua Lin
Yu Qiao
Ziwei Liu
VGenDiffM
334
355
0
26 Sep 2023
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
StableVideo: Text-driven Consistency-aware Diffusion Video EditingIEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Chai
Xun Guo
Gaoang Wang
Yang Lu
VGenDiffM
313
215
0
18 Aug 2023
CoDeF: Content Deformation Fields for Temporally Consistent Video
  Processing
CoDeF: Content Deformation Fields for Temporally Consistent Video ProcessingComputer Vision and Pattern Recognition (CVPR), 2023
Ouyang Hao
Qiuyu Wang
Yuxi Xiao
Qingyan Bai
Juntao Zhang
Kecheng Zheng
Xiaowei Zhou
Qifeng Chen
Yujun Shen
DiffMVGen
251
124
0
15 Aug 2023
ModelScope Text-to-Video Technical Report
ModelScope Text-to-Video Technical Report
Jiuniu Wang
Hangjie Yuan
Dayou Chen
Yingya Zhang
Xiang Wang
Shiwei Zhang
VGenDiffM
447
656
0
12 Aug 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
759
1,636
0
05 Jun 2023
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
Preserve Your Own Correlation: A Noise Prior for Video Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Songwei Ge
Seungjun Nah
Guilin Liu
Tyler Poon
Andrew Tao
Bryan Catanzaro
David Jacobs
Jia-Bin Huang
Ming-Yuan Liu
Yogesh Balaji
DiffMVGen
584
313
0
17 May 2023
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
AMT: All-Pairs Multi-Field Transforms for Efficient Frame InterpolationComputer Vision and Pattern Recognition (CVPR), 2023
Zerui Li
Zuo-Liang Zhu
Linghao Han
Qibin Hou
Chunle Guo
Ming-Ming Cheng
250
171
0
19 Apr 2023
Align your Latents: High-Resolution Video Synthesis with Latent
  Diffusion Models
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2023
A. Blattmann
Robin Rombach
Huan Ling
Tim Dockhorn
Seung Wook Kim
Sanja Fidler
Karsten Kreis
3DGSVGen
717
1,575
0
18 Apr 2023
Neural Video Compression with Diverse Contexts
Neural Video Compression with Diverse ContextsComputer Vision and Pattern Recognition (CVPR), 2023
Jiahao Li
Bin Li
Yan Lu
556
271
0
28 Feb 2023
Structure and Content-Guided Video Synthesis with Diffusion Models
Structure and Content-Guided Video Synthesis with Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Patrick Esser
Johnathan Chiu
Parmida Atighehchian
Jonathan Granskog
Anastasis Germanidis
DiffMVGen
475
717
0
06 Feb 2023
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video DataInternational Conference on Learning Representations (ICLR), 2022
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffMVGen
395
1,931
0
29 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic
  Compression
Cross Modal Compression: Towards Human-comprehensible Semantic CompressionACM Multimedia (MM), 2021
Jiguo Li
Chuanmin Jia
Xinfeng Zhang
Siwei Ma
Wen Gao
180
29
0
06 Sep 2022
Deep Contextual Video Compression
Deep Contextual Video Compression
Jiahao Li
Bin Li
Yan Lu
463
416
0
30 Sep 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
2.2K
45,649
0
26 Feb 2021
Non-local Attention Optimized Deep Image Compression
Non-local Attention Optimized Deep Image Compression
Haojie Liu
Tong Chen
Peiyao Guo
Qiu Shen
Xun Cao
Yao Wang
Zhan Ma
258
310
0
22 Apr 2019
1
Page 1 of 1