Autonomic discovery of subgoals in hierarchical reinforcement learning

doi:10.1016/S1005-8885(14)60337-X

Acta Metallurgica Sinica(English letters) ›› 2014, Vol. 21 ›› Issue (5): 94-104.doi: 10.1016/S1005-8885(14)60337-X

Autonomic discovery of subgoals in hierarchical reinforcement learning

Received:2013-11-11 Revised:2014-06-23 Online:2014-10-31 Published:2014-10-30
Contact: XIAO Ding E-mail:dxiao@bupt.edu.cn

Abstract

Abstract: 　Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent’s actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.

Key words: hierarchical reinforcement learning, option, Q-learning, subgoal, UDV

CLC Number:

TP393

References

1. Singh S P, Jaakkola T, Jordan M I. Reinforcement learning with soft state aggregation. Advance in Neural Information Processing Systems 7: Proceedings of the Neural Information Processing Systems Conference (NIPS’94), Nov 28-Dec 1, 1994, Denver, CO, USA. Cambridge, MA, USA: MIT Press, 1995: 361-368

2. Tsitsiklis J N, Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 1997, 42(5): 674-690

3. Dietterich T G. Hierarchical reinforcement learning with the max Q value function decomposition. Journal of Artificial Intelligence Research, 2000, 13: 227-303

4. Parr R. Hierarchical control and learning for Markov decision processes. PhD Thesis. Berkeley, CA, USA: University of California, Berkeley, 1998

5. Simsek Ö, Wolfe P A, Barto A G. Identifying useful subgoals in reinforcement learning by local graph partitioning. Proceedings of the 22nd International Conference on Machine Learning (ICML’05), Aug 7-10, 2005. Bonn, Germany. New York, NY, USA: ACM, 2005: 816-823

6. Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1/2): 181-211

7. Digney B L. Learning hierarchical control structure for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the 5th International Conference on Simulation of Adaptive Behavior (SAB’98). Aug 17-21, 1998, Zurich, Switzerland. Cambridge, MA, USA: MIT Press, 1998: 321-330

8. Mcgovern A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density. Proceedings of the 18th International Conference on Machine Learning (ICML’01), Jun 28-Jul 1, Williamstown, MA, USA. San Francisco, CA, USA: Morgan Kaufmann, 2001: 361-368

9. Stolle M, Precup D. Learning options in reinforcement learning. Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation (SARA’02), Aug 2-4, Kananaskis, Canada. Berlin, Germany: Springer, 2002: 212-223

10. Asadi M, Huber M. Autonomous subgoal discovery and hierarchical abstraction for reinforcement learning using Monte Carlo method. Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference (AAAI’05), Jul 9-13, 2005, Pittsburgh, PA, USA. Cambridge, MA, USA: MIT Press, 2005: 1588-1589

11. Goel S, Huber M. Subgoal discovery for hierarchical reinforcement learning using learnt policies. Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference (FLAIRS’03), May 12-14, 2003, St Augustine, FL, USA. 2003: 346-350

12. Mannor S, Menache I, Hoze I, et al. Dynamic abstraction in reinforcement learning via clustering. Proceedings of the 21st International Conference on Machine Learning (ICML’04), Jul 4-8, 2004, Banff, Canada. San Francisco, CA, USA: Morgan Kaufmann, 2004: 560-567

13. Menache I, Mannor S, Shimkin N. Q-cut-dynamic discovery of subgoals in reinforcement learning. Proceedings of the 13th European Conference on Machine Learning (ECML’02), Aug 19-23, 2002, Helsinki, Finland. Berlin, Germany: Springer, 2002: 295-306

14. Jing S, Gu G C, Liu H B. Automatic option generation in hierarchical reinforcement learning via immune clustering. Proceedings of the 1st International Symposium on Systems and Control in Aerospace and Astronautics(SSCAA’06), Jan 19-21, 2006, Harbin, China. Piscataway, NJ, USA: IEEE, 2006: 4p

15. Simsek Ö, Barto A G. Skill characterization based on betweenness. Advances in Neural Information Processing Systems 21: Proceedings of the 22 Annual Conference on Neural Information Processing Systems (NIPS’09), Dec 8-11, 2008, Vancouver, Canada. Cambridge, MA, USA: MIT Press, 2009: 1497-1504

16. Entezari N, Shiri M E, Moradi P. Subgoal discovery in reinforcement learning using local graph clustering. International Journal of Future Generation Communication and Networking, 2011,4(3): 13-23

17. He R J, Brunskill E, Roy N. PUMA: Planning under uncertainty with macro-actions. Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10), Jul 11-15, 2010, Atlanta, GA, USA. Cambridge, MA, USA: MIT Press, 2010: 1089-1096

18. Konidaris G, Barto A. Efficient skill learning using abstraction selection. Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09}, Jul 11-17, 2009, Pasadena, CA, USA. 2009: 1107-1113

19. Wang B N, Gao Y, Chen Z Q, et al. K-cluster subgoal discovery algorithm for option. Journal of Computer Research and Development, 2006, 42(5): 851-855 ( in Chinese)

20. Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge, MA, USA: MIT Press, 1998

21. Precup D. Temporal abstraction in reinforcement learning. Ph. D Thesis. Amherst, MA, USA: University of Massachusetts, 2000

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

[1]	. Trusted GPSR protocol without reputation faking in VANET [J]. Acta Metallurgica Sinica(English letters), 2015, 22(5): 22-31.
[2]	Wen-Xiao SHI Dan WU Yin-Long XU Ji-Hong WANG. Routing metric of interference-aware link quality: an improved ETX in wireless mesh networks [J]. Acta Metallurgica Sinica(English letters), 2014, 21(5): 61-67.
[3]	. Host load prediction in cloud based on classification methods [J]. Acta Metallurgica Sinica(English letters), 2014, 21(4): 40-46.
[4]	. Cloud download system optimizing by job and notification scheduling [J]. Acta Metallurgica Sinica(English letters), 2014, 21(4): 47-53.
[5]	WANG Ruo-yu , LIU Zhen, ZHANG Ling. Method of data cleaning for network traffic classification [J]. Acta Metallurgica Sinica(English letters), 2014, 21(3): 35-45.
[6]	. MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment [J]. Acta Metallurgica Sinica(English letters), 2013, 20(6): 77-87.
[7]	LI Wen-ji, ZHENG Kang-feng, ZHANG Dong-mei, YE-Qing, YANG Yi-xian. Efficient identity-based signature scheme with batch authentication for delay tolerant mobile sensor network [J]. Acta Metallurgica Sinica(English letters), 2013, 20(4): 80-86.
[8]	FU Xiong , ZHU Xin-xin, HAN Jing-yu, WANG Ru-chuan. QoS-aware replica placement for data intensive applications [J]. Acta Metallurgica Sinica(English letters), 2013, 20(3): 43-47.
[9]	YAO Yu-kun , WANG Guan, REN Zhi, LI Peng-xiang, CHEN Yong-chao. Efficient distributed address assignment algorithm based on topology maintenance in ZigBee networks [J]. Acta Metallurgica Sinica(English letters), 2013, 20(3): 53-59.
[10]	ZHANG Zhao-liang, LI Dong, HUANG Ting-pei, CUI Li. Leveraging data fusion to improve barrier coverage in wireless sensor networks [J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 26-36.
[11]	SHA Chao, WANG Ru-chuan. Energy-efficient node deployment strategy forwireless sensor networks [J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 54-57.
[12]	YAO Yu-kun, WEN Ya-di, REN Zhi, LIU Zhi-hu. High efficient multipacket decoding approach for network coding in wireless networks [J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 95-100.
[13]	. Further results on local stability of LRC-RED algorithm in Internet [J]. Acta Metallurgica Sinica(English letters), 2012, 19(5): 99-103.
[14]	. Stability analysis in an AVQ model of Internet congestion control algorithm [J]. Acta Metallurgica Sinica(English letters), 2012, 19(4): 22-28.
[15]	Ji-Xiang WU. Enhanced calculation of necessary QoS for user satisfaction with a QoS mapping matrix [J]. Acta Metallurgica Sinica(English letters), 2012, 19(4): 29-33.

Autonomic discovery of subgoals in hierarchical reinforcement learning

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments