Fuzzy Q-learning in continuous state and action space

doi:10.1016/S1005-8885(09)60495-7

Acta Metallurgica Sinica(English letters) ›› 2010, Vol. 17 ›› Issue (4): 100-109.doi: 10.1016/S1005-8885(09)60495-7

• Wireless • Previous Articles Next Articles

Fuzzy Q-learning in continuous state and action space

XU Ming-liang, XU Wen-bo

Department of Electronic Information Engineering, Wuxi City College of Vocational Technology, Wuxi 214063, China

Received:2009-11-27 Revised:2010-06-23 Online:2010-08-30 Published:2010-08-31
Contact: XU Ming-Liang E-mail:xml1973@126.com
Supported by:
This work was supported by the National Natural Science Foundation of China (60703106).

Abstract

Abstract:

An adaptive fuzzy Q-learning (AFQL) based on fuzzy inference systems (FIS) is proposed. The FIS realized by a normalized radial basis function (NRBF) neural network is used to approach Q-value function, whose input is composed of state and action. The rules of FIS are created incrementally according to the novelty of each element of the pair of state-action. Moreover the premise part and consequent part of the FIS are updated using extended Kalman filter (EKF). The action that impacts on environment is the one with maximum output of FIS in the current state and generated through optimization method. Simulation results in the wall-following task of mobile robots and the inverted pendulum balancing problem demonstrate that the superiority and applicability of the proposed AFQL method.

Key words:

Q-learning, FIS, continuous, adaptation

XU Ming-liang, XU Wen-bo. Fuzzy Q-learning in continuous state and action space[J]. Acta Metallurgica Sinica(English letters), 2010, 17(4): 100-109.

References

1. Hwang K S, Tan S W, Tsai M C. Reinforcement learning to adaptive control of nonlinear systems. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 2003, 33(3): 514-521

2. Preux P, Delepoulle S, Darcheville J C. A generic architecture for adaptive agents based on reinforcement learning. Information Sciences, 2004, 161 (1/2): 37-55

3. Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 1983, 13(5): 834-846

4. Sutton R, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(2): 181-211

5. Dietterich T. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303

6. Andre D, Russell S J. Programmable reinforcement learning agents. Advances in Neural Information Processing Systems 13. Cambridge, MA, USA: MIT Press, 2001: 1019-1025

7. Sutton R S. Generalization in reinforcement learning: successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8. Cambridge, MA, USA: MIT Press, 1996: 1038-1044

8. Albus J S. A new approach to manipulator control: the cerebellar model articulation controller (CMAC). Transactions of the ASME, Series G: Journal of Dynamic Systems, Measurement and Control, 1975, 97(3): 220-227

9. Rummery G A. Problem solving with reinforcement learning. Ph. D. Thesis. Cambridge, UK: Cambridge University, 1995

10. Ormoneit D, Sen S. Kernel-based reinforcement learning. Machine Learning, 2004, 49(2/3): 161-178

11. Gaskett C, Wettergreen D, Zelinsky A. Q-learning in continuous state and action spaces. Proceedings of the 12th Australian Joint Conference on Artificial Intelligence (AI’99), Dec 6-10, 1999, Sydney, Australia. LNCS 1747. Berlin, Germany: Springer- Verlag, 1999: 417-428

12. Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. Advances in Neural Information Processing Systems 20. Cambridge, MA, USA: MIT Press, 2007: 833-840

13. Jouffe L. Fuzzy inference system learning by reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, 1998, 28 (3): 338-355

14. Maeda Y. Modified Q-learning method with fuzzy state division and adaptive rewards. Proceedings of the IEEE International Conference on Fuzzy Systems(FUZZ-IEEE’02): Vol 2, May 12-17, 2002, Honolulu, HI, USA. Piscatawaw, NJ, USA: IEEE, 2002, 1556-1561

15. Horiuchi T, Fujino A, Katai O, et al. Fuzzy interpolation based Q-learning with profit sharing plan scheme. Proceedings of the 6th IEEE International Conference on Fuzzy Systems (FUZZY-IEEE’97): Vol 3, Jul 1-5, 1997, Barcelona, Spain. Piscataway, NJ, USA: IEEE, 1997: 1707-1712

16. Er M J, Deng C. Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 2004, 34(3): 1478-1489

17. Juang C F. Combination of online clustering and Q-value based GA for reinforcement fuzzy system design. IEEE Transactions on Fuzzy Systems, 2005, 13(3): 289-30

18. Kazemian H B, Li M. A fuzzy control scheme for video transmission in Bluetooth wireless. Information Sciences, 2006, 176(9): 1266-1289

19. Melin P, Castillo O. Intelligent control of a stepping motor drive using an adaptive neuron-fuzzy inference system. Information Sciences, 2005, 170 (2/3/4): 133-151

20. Lee C C. Fuzzy logic in control systems: fuzzy logic controller, Part I. IEEE Transactions Systems, Man and Cybernetics, 1990, 20 (2): 404-418

21. Lee C C. Fuzzy logic in control systems: fuzzy logic controller, Part II. IEEE Transactions Systems, Man and Cybernetics, 1990, 20 (2): 419-435

22. Platt J. A resource allocating network for function interpolation. Neural Computation, 1991, 3(2): 213-225

23. Singhal S, Wu L. Training multilayer perceptrons with the extended Kalman algorithm. Advances in Neural Processing Systems 1. San Mateo, CA, USA: Morgan Kaufman, 1989: 133-140

24. Kennedy J, Eberhart R C. Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Networks (ICNN’95): Vol 4, Nov 27-Dec 1, Perth, Australia. Piscatawaw, NJ, USA: IEEE, 1995: 1942-1948

25. Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robotics and Autonomous Systems, 2004, 46(2): 111-124

26. Lin L J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 1992, 8(3/4): 293-321

27. K-Team S A. Khepera 2 user manual. Preverenges, Switzerland, 2002

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

Fuzzy Q-learning in continuous state and action space

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments