Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Adaptive embedded control of cyber-physical systems using reinforcement learning

Embedded control parameters of cyber-physical systems (CPS), such as sampling rate, are typically invariant and designed with a worst case scenario in mind. In an over-engineered system, control parameters are assigned values that satisfy system-wide performance requirements at the expense of excessive energy and resource overheads. Dynamic and adaptive control parameters can reduce the overhead but are complex and require in-depth knowledge of the CPS and its operating environment – which typically is unavailable during design time. The authors investigate the application of reinforcement learning (RL) to dynamically adapt high-level system parameters, at run time, as a function of the system state. RL is an alternative approach to the classical control theory for CPSs that can learn and adapt control properties without the need of an in-depth controller model. Specifically, we show that RL can modulate sampling times to save processing power without compromising control quality. We apply a novel statistical cloud-based evaluation framework to study the validity of our approach for the cart-pole balancing control problem as well as the well-known mountain car problem. The results show an improved real-world power efficiency of up to 20% compared with an optimal system with fixed controller settings.

References

    1. 1)
      • 14. Albertos, P., Crespo, A.: ‘Real-time control of non-uniformly sampled systems’, Control Eng. Pract., 1999, 7, (4), pp. 445458.
    2. 2)
      • 19. Ahnert, K., Mulansky, M.: ‘Odeint-solving ordinary differential equations in C++’, arXiv preprint arXiv:11103397, 2011.
    3. 3)
      • 9. Henriksson, D., Cervin, A.: ‘Optimal on-line sampling period assignment for real-time control tasks based on plant state information’. 44th IEEE Conf. Decision and Control, 2005 and 2005 European Control Conf. CDC-ECC'05, 2005, pp. 44694474.
    4. 4)
      • 6. Kara, E.C., Berges, M., Krogh, B., et al: ‘Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework’. 2012 IEEE Third Int. Conf. Smart Grid Communications (SmartGridComm), 2012, pp. 8590.
    5. 5)
      • 1. Juan, D.C., Garg, S., Park, J., et al: ‘Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling’. 2013 Int. Conf. Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013, pp. 110.
    6. 6)
      • 3. Sala, A.: ‘Computer control under time-varying sampling period: an LMI gridding approach’, Automatica, 2005, 41, (12), pp. 20772082.
    7. 7)
      • 10. Simon, D., Robert, D., Sename, O.: ‘Robust control/scheduling co-design: application to robot control’. 11th IEEE Real Time and Embedded Technology and Applications Symp., 2005. RTAS 2005, 2005, pp. 118127.
    8. 8)
      • 20. Durand, S., Castellanos, J.F.G., Marchand, N., et al: ‘Event-based control of the inverted pendulum: swing up and stabilization’, J. Control Eng. Appl. Inform., 2013, 15, (3), pp. 96104.
    9. 9)
      • 7. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al: ‘Continuous control with deep reinforcement learning’. 2015arXiv preprint arXiv:150902971.
    10. 10)
      • 15. Khan, S.G., Herrmann, G., Lewis, F.L., et al: ‘Reinforcement learning and optimal adaptive control: an overview and implementation examples’, Annu. Rev. Control, 2012, 36, (1), pp. 4259.
    11. 11)
      • 12. Balluchi, A., Murrieri, P., Sangiovanni Vincentelli, A.L.: ‘Controller synthesis on non-uniform and uncertain discrete–time domains’, in Morari, M. (Ed.), ‘Hybrid systems: computation and control’ (Springer, 2005), pp. 118133.
    12. 12)
      • 4. Cervin, A., Velasco, M., Martí, P., et al: ‘Optimal online sampling period assignment: theory and experiments’, IEEE Trans. Control Syst. Technol., 2011, 19, (4), pp. 902910.
    13. 13)
      • 16. Marchand, N., Durand, S., Castellanos, J.F.G.: ‘A general formula for the stabilization of event-based controlled systems’. 2011 50th IEEE Conf. Decision and Control and European Control Conf., 2011, pp. 81998204.
    14. 14)
      • 11. Albertos, P., Salt, J.: ‘Non-uniform sampled-data control of MIMO systems’, Annu. Rev. Control, 2011, 35, (1), pp. 6576.
    15. 15)
      • 21. Moore, A.W.: ‘Efficient memory-based learning for robot control’. 1990.
    16. 16)
      • 17. Watkins, C.J., Dayan, P.: ‘Q-learning’, Mach. Learn., 1992, 8, (3-4), pp. 279292.
    17. 17)
      • 13. Khan, S., Goodall, R.M., Dixon, R.: ‘Non-uniform sampling strategies for digital control’, Int. J. Syst. Sci., 2013, 44, (12), pp. 22342254.
    18. 18)
      • 2. Buini, H.M., Peter, S., Givargis, T.: ‘Including variability of physical models into the design automation of cyber-physical systems’. 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), 2015, pp. 16.
    19. 19)
      • 8. Neema, H., Lattmann, Z., Meijer, P., et al: ‘Design space exploration and manipulation for cyber physical systems’. IFIP First Int. Workshop on Design Space Exploration of Cyber-Physical Systems (IDEAL), 2014.
    20. 20)
      • 5. El Tantawy, S., Abdulhai, B., Abdelgawad, H.: ‘Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown Toronto’, IEEE Trans. Intell. Transport. Syst., 2013, 14, (3), pp. 11401150.
    21. 21)
      • 18. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an introduction’ (MIT Press, 1998).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cps.2017.0048
Loading

Related content

content/journals/10.1049/iet-cps.2017.0048
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address