Two-stage reinforcement-learning-based cognitive radio with exploration control
This study presents a novel two-stage reinforcement-learning-based algorithm for distributed cognitive radio (CR) spectrum sharing. The traditional reinforcement-learning model is modified in order to be applied in a fully distributed CR scenario. CRs are able to discover the best available resources autonomously by utilising learning, which results in significantly improved performance, while reducing the need for spectrum sensing. Instead of sensing all available spectrum arbitrarily, the scheme is designed to share the spectrum based on an optimal spectrum sharing strategy, which is discovered by the CR agents from their trial-and-error interactions with the wireless communication environment. On the other hand, the inherent exploration against exploitation trade-off seen in reinforcement learning is also examined in the context of CR. A ‘warm-up’ stage is proposed to effectively control the exploration phase of the learning process. A better system performance can be expected by carefully balancing the tradeoff between exploration and exploitation. The benefit of applying a warm-up stage is demonstrated. Comparisons of system performance using different warm-up strategies are also given to illustrate their impact on the spectrum sharing process.