ORXE: Orchestrating Experts for Dynamically Configurable Efficiency

7 May 2025

Abstract

This paper presents ORXE, a modular and adaptable framework for achieving real-time configurable efficiency in AI models. By leveraging a collection of pre-trained experts with diverse computational costs and performance levels, ORXE dynamically adjusts inference pathways based on the complexity of input samples. Unlike conventional approaches that require complex metamodel training, ORXE achieves high efficiency and flexibility without complicating the development process. The proposed system utilizes a confidence-based gating mechanism to allocate appropriate computational resources for each input. ORXE also supports adjustments to the preference between inference cost and prediction performance across a wide range during runtime. We implemented a training-free ORXE system for image classification tasks, evaluating its efficiency and accuracy across various devices. The results demonstrate that ORXE achieves superior performance compared to individual experts and other dynamic models in most cases. This approach can be extended to other applications, providing a scalable solution for diverse real-world deployment scenarios.

View on arXiv

@article{wang2025_2505.04850,
  title={ ORXE: Orchestrating Experts for Dynamically Configurable Efficiency },
  author={ Qingyuan Wang and Guoxin Wang and Barry Cardiff and Deepu John },
  journal={arXiv preprint arXiv:2505.04850},
  year={ 2025 }
}

Comments on this paper