Efficient Massive-Device Orchestration Through Reinforcement Learning With Boosted Deep Deterministic Policy Gradient
Date
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Big data innovations are boosted by massive devices that capture a large amount of dynamics from human or environment and further mine the insights hidden in the dynamics. However, the challenge arises in the complex massive-device orchestration, meaning that it is essential to configure and manage the massive devices and the gateway/server. The complexity, on the massive wearable or Internet of Things devices, lies in the diverse energy budget, computing efficiency, and communication channel conditions. On the phone or server side, it lies in how the global diversity can be analyzed and how the system configuration can be optimized. Targeting this obstacle, we propose a new reinforcement learning architecture, called boosted deep deterministic policy gradient, with enhanced actor–critic co-learning and multiview state-transformation. More specifically, the proposed actor–critic co-learning allows for enhanced dynamics abstraction through the shared neural network component. Further, the state-transformation, with multiple parallel learning agents, greatly boosts the action quality and learning process. Evaluated on complex massive-device orchestration tasks, the proposed deep reinforcement learning framework has achieved much more efficient system configurations with enhanced computing capabilities and energy efficiency. This study will greatly advance massive-device system configuration through deep learning and reinforcement rewarding mechanisms, toward efficient big data practices.