All-in-One Transferring Image Compression from Human Perception to Multi-Machine Perception
- VLM
Efficiently transferring Learned Image Compression (LIC) model from human perception to machine perception is an emerging challenge in vision-centric representation learning. Existing approaches typically adapt LIC to downstream tasks in a single-task manner, which is inefficient, lacks task interaction, and results in multiple task-specific bitstreams. In this paper, we propose a multi-task adaptation framework that enables transferring a pre-trained base codec to multiple machine vision tasks through a unified model and a single training process. To achieve this, we design an asymmetric adaptation architecture consisting of a task-agnostic encoder adaptation and task-specific decoder adaptation. Furthermore, we introduce two feature propagation modules to facilitate inter-task and inter-scale feature represenation learning. Experiments on PASCAL-Context and NYUD-V2 dataset demonstrate that our method outperforms both Fully Fine-Tuned and other Parameter Efficient Fine-Tuned (PEFT) baselines. Code will be released.
View on arXiv