Scalable and Order-robust Continual Learning with Hierarchically
Decomposed Networks
- CLL
While recent continual learning methods largely alleviate the catastrophic problem on toy-size datasets, there are issues that remain to be tackled in order to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely vary based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameter for each task as a sum of task-shared and sparse task-adaptive parameters. With our hierarchically decomposed networks (HDN), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, with hierarchical knowledge consolidation which clusters the task-adaptive parameters to obtain hierarchically shared parameters, HDN becomes highly scalable. We validate HDN on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, efficiency, scalability, and order-robustness.
View on arXiv