72

OpComm: A Reinforcement Learning Framework for Adaptive Buffer Control in Warehouse Volume Forecasting

Wilson Fung
Lu Guo
Drake Hilliard
Alessandro Casadei
Raj Ratan
Sreyoshi Bhaduri
Adi Surve
Nikhil Agarwal
Rohit Malshe
Pavan Mullapudi
Hungjen Wang
Saurabh Doodhwala
Ankush Pole
Arkajit Rakshit
Abstract

Accurate forecasting of package volumes at delivery stations is critical for last-mile logistics, where errors lead to inefficient resource allocation, higher costs, and delivery delays. We propose OpComm, a forecasting and decision-support framework that combines supervised learning with reinforcement learning-based buffer control and a generative AI-driven communication module. A LightGBM regression model generates station-level demand forecasts, which serve as context for a Proximal Policy Optimization (PPO) agent that selects buffer levels from a discrete action set. The reward function penalizes under-buffering more heavily than over-buffering, reflecting real-world trade-offs between unmet demand risks and resource inefficiency. Station outcomes are fed back through a Monte Carlo update mechanism, enabling continual policy adaptation. To enhance interpretability, a generative AI layer produces executive-level summaries and scenario analyses grounded in SHAP-based feature attributions. Across 400+ stations, OpComm reduced Weighted Absolute Percentage Error (WAPE) by 21.65% compared to manual forecasts, while lowering under-buffering incidents and improving transparency for decision-makers. This work shows how contextual reinforcement learning, coupled with predictive modeling, can address operational forecasting challenges and bridge statistical rigor with practical decision-making in high-stakes logistics environments.

View on arXiv
Main:3 Pages
1 Figures
Bibliography:1 Pages
1 Tables
Comments on this paper