Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Online model-free reinforcement learning for the automatic control of a flexible wing aircraft

The control problem of the flexible wing aircraft is challenging due to the prevailing high non-linear deformations in the flexible wing system. This urged for new control mechanisms that are robust to the real-time variations in the wing's aerodynamics. An online control mechanism based on a value iteration reinforcement learning process is developed for flexible wing aerial structures. It employs a model-free control policy framework and a guaranteed convergent adaptive learning architecture to solve the system's Bellman optimality equation. A Riccati equation is derived and shown to be equivalent to solving the underlying Bellman equation. The online reinforcement learning solution is implemented using means of an adaptive-critic mechanism. The controller is proven to be asymptotically stable in the Lyapunov sense. It is assessed through computer simulations and its superior performance is demonstrated in two scenarios under different operating conditions.

References

    1. 1)
      • 7. Cook, M.V.: ‘The theory of the longitudinal static stability of the hang-glider’, Aeronaut. J., 1994, 98, (978), pp. 292304.
    2. 2)
      • 26. Abouheaf, M., Gueaieb, W.: ‘Multi-agent reinforcement learning approach based on reduced value function approximations’. IEEE Int. Symp. on Robotics and Intelligent Sensors (IRIS), Ottawa, Canada, 2017, pp. 111116.
    3. 3)
      • 2. Kilkenny, E.A.: ‘An evaluation of a mobile aerodynamic test facility for hang glider wings’ (College of Aeronautics 8330, Cranfield Institute of Technology, Cranfield, UK, 1983).
    4. 4)
      • 19. Rubio, J.d.J., Pieper, J., Meda-Campaña, J.A., et al: ‘Modelling and regulation of two mechanical systems’, IET Sci. Meas. Technol., 2018, 12, (5), pp. 657665.
    5. 5)
      • 21. Howard, R.A.: ‘Dynamic programming and Markov processes’ Four volumes (MIT Press, Cambridge, MA, 1960).
    6. 6)
      • 18. Aguilar-Ibanez, C.: ‘Stabilization of the PVTOL aircraft based on a sliding mode and a saturation function’, Int. J. Robust Nonlinear Control, 2017, 27, (5), pp. 843859.
    7. 7)
      • 27. Abouheaf, M., Lewis, F.: ‘Approximate dynamic programming solutions of multi-agent graphical games using actor–critic network structures’. Int. Joint Conf. on Neural Networks (IJCNN), Dallas, TX, USA, 2013, pp. 18.
    8. 8)
      • 36. Busoniu, L., Babuska, R., Schutter, B.D.: ‘A comprehensive survey of multi-agent reinforcement learning’, IEEE Trans. Syst. Man. Cybern. C, Appl. Rev., 2008, 38, (2), pp. 156172.
    9. 9)
      • 14. de-Matteis, G.: ‘Dynamics of hang gliders’, J. Guid. Control Dyn., 1991, 14, (6), pp. 11451152.
    10. 10)
      • 30. Bellman, R.: ‘Dynamic programming’ (Princeton University Press, Princeton, NJ, USA, 1957).
    11. 11)
      • 34. Werbos, P.: ‘Neural networks for control and system identification’. Proc. 28th IEEE Conf. on Decision and Control, Tampa, FL, USA, 1989, vol. 1, pp. 260265.
    12. 12)
      • 10. Cook, M.: ‘Flight dynamics principles: a linear systems approach to aircraft stability and control’ aerospace engineering' (Butterworth-Heinemann, Oxford, UK, 2012, 3rd edn.).
    13. 13)
      • 1. Cook, M.V., Spottiswoode, M.: ‘Modelling the flight dynamics of the hang glider’, Aeronaut. J., 2005, 109, (1102), pp. IXX.
    14. 14)
      • 25. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an introduction’ (MIT Press, Massachusetts, USA, 1998).
    15. 15)
      • 20. Rubio, J.: ‘Robust feedback linearization for nonlinear processes control’, ISA Trans., 2018, 74, pp. 155164.
    16. 16)
      • 35. Werbos, P.: ‘Beyond regression: new tools for prediction and analysis in the behavior sciences’. PhD thesis, Harvard University, 1974.
    17. 17)
      • 4. Cook, M.V., Kilkenny, E.A.: ‘An experimental investigation of the aerodynamics of the hang glider’. Aerodynamics at low Reynolds numbers Re greater than 10 to the 4th and less than 10 to the 6th, London, UK, 1986, pp. 110.
    18. 18)
      • 37. Vrancx, P., Verbeeck, K., Nowe, A.: ‘Decentralized learning in Markov games’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2008, 38, (4), pp. 976981.
    19. 19)
      • 23. Bertsekas, D.P., Tsitsiklis, J.N.: ‘Neuro-dynamic programming: an overview’. Proc. IEEE Conf. on Decision and Control, New Orleans, LA, USA, 1995, vol. 1, pp. 560564.
    20. 20)
      • 29. Lewis, F., Vrabie, D., Syrmos, V.: ‘Optimal control’ (John Wiley, New York, USA, 2012, 3rd edn.).
    21. 21)
      • 24. Miller, W.T., Sutton, R.S., Werbos, P.J.: ‘Neural networks for control: a menu of designs for reinforcement learning over time’ (MIT Press, Massachusetts, USA, 1990, 1st edn.).
    22. 22)
      • 3. Kilkenny, E.: ‘Full scale wind tunnel tests on hang glider pilots’ (Cranfield Institute of Technology, College of Aeronautics, Department of Aerodynamics, 1984).
    23. 23)
      • 13. De-Matteis, G.: ‘Response of hang gliders to control’, Aeronaut. J., 1990, 94, (938), pp. 289294.
    24. 24)
      • 15. Spottiswoode, M.: ‘A theoretical study of the lateral–directional dynamics, stability and control of the hang glider’. MSc thesis, College of Aeronautics, Cranfield Institute of Technology, 2001.
    25. 25)
      • 32. Sen, S., Weiss, G.: ‘Learning in multi-agent systems’, in Weiss, G. (Ed.): ‘Multiagent systems’ (MIT Press, Cambridge, MA, USA, 1999), pp. 259298.
    26. 26)
      • 38. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: ‘Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2008, 38, (4), pp. 943949.
    27. 27)
      • 17. Rollins, R.: ‘Study of experimental data to assess the longitudinal stability and control of the hang glider’. MSc thesis, College of Aeronautics, Cranfield University, 2000.
    28. 28)
      • 16. Ochi, Y.: ‘Modeling of the longitudinal dynamics of a hang glider’. AIAA Modeling and Simulation Technologies Conf., Kissimmee, FL, USA, 2015, pp. 15911608.
    29. 29)
      • 11. Kroo, I.: ‘Aerodynamics, aeroelasticity and stability of hang gliders’. PhD thesis, Stanford University, 1983.
    30. 30)
      • 9. Sweeting, J.: ‘An experimental investigation of hang glider stability’. MSc thesis, College of Aeronautics, Cranfield University, 1981.
    31. 31)
      • 31. Bryson, A.: ‘Optimal control-1950 to 1985’, IEEE Control Syst., 1996, 16, (3), pp. 2633.
    32. 32)
      • 8. Ochi, Y.: ‘Modeling of flight dynamics and pilot's handling of a hang glider’. AIAA Modeling and Simulation Technologies Conf., Grapevine, TX, USA, 2017, pp. 17581776.
    33. 33)
      • 12. Powton, J.: ‘A theoretical study of the non-linear aerodynamic pitching moment characteristics of the hang glider and its influence on stability and control’. MSc thesis, College of Aeronautics, Cranfield Institute of Technology, 1995.
    34. 34)
      • 5. Kilkenny, E.A.: ‘An experimental study of the longitudinal aerodynamic and static stability characteristics of hang gliders’. PhD thesis, Cranfield University, 1986.
    35. 35)
      • 33. Widrow, B., Gupta, N.K., Maitra, S.: ‘Punish/reward: learning with a critic in adaptive threshold systems’, IEEE Trans. Syst. Man Cybern., 1973, SMC-3, (5), pp. 455465.
    36. 36)
      • 22. Werbos, P.: ‘Approximate dynamic programming for real-time control and neural modeling’, in White, D.A., Sorge, D.A., (Eds.): ‘Handbook of intelligent control’ (Van Nostrand Reinhold, New York, 1992), pp. 493–0 525.
    37. 37)
      • 6. Blake, D.: ‘Modelling The aerodynamics, stability and control of the hang glider’. MSc thesis, College of Aeronautics, Cranfield Institute of Technology, 1991.
    38. 38)
      • 28. Abouheaf, M., Liu, F.: Lewis, : ‘Dynamic graphical games: online adaptive learning solutions using approximate dynamic programming’, in Liu, D., Alippi, C., Zhao, D., Zhang, H. (Eds.): ‘Frontiers in intelligent control and information processing’ (World Scientific, London, UK, 2014), pp. 148.
    39. 39)
      • 39. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., et al: ‘Adaptive optimal control for continuous-time linear systems based on policy iteration’, Automatica, 2009, 45, (2), pp. 477484.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2018.6163
Loading

Related content

content/journals/10.1049/iet-cta.2018.6163
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address