Policy Gradient Methods
Forthcoming March 2025.
References
-
Agarwal, A., Kakade, S.M., Lee, J.D. and Mahajan, G., 2021. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98), pp.1-76.
MR4279749
-
Amari, S.I., 1998. Natural gradient works efficiently in learning. Neural Computation, 10(2), pp.251-276.
-
Amari, S.I., 1998. The natural gradient learning algorithm for neural networks.
Theoretical Aspects of Neural Computation (Hong Kong, 1997), pp. 1–15.
Springer-Verlag Singapore.
MR1655598
-
Antos, A., Szepesvári, C. and Munos, R., 2008. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71, pp.89-129.
-
Azar, M.G., Gómez, V. and Kappen, H.J., 2012. Dynamic policy programming. Journal of Machine Learning Research, 13(1), pp.3207-3245.
MR3005885
-
Bhandari, J. and Russo, D., 2024. Global optimality guarantees for policy gradient methods. Operations Research, 72(5), pp.1906-1927.
MR4818024
-
Cen, S., Cheng, C., Chen, Y., Wei, Y. and Chi, Y., 2022. Fast global convergence of natural policy gradient methods with entropy regularization.Operations Research, 70(4), pp.2563-2578.
MR4484422
-
Ding, Y., Zhang, J., Lee, H. and Lavaei, J., 2025. Beyond exact gradients: Convergence of stochastic soft-max policy gradient methods with entropy regularization. IEEE Transactions on Automatic Control, forthcoming.
-
Even-Dar, E., Kakade, S.M. and Mansour, Y., 2009. Online Markov decision processes. Mathematics of Operations Research, 34(3), pp.726-736.
MR2555346
-
Ged, F.G. and Veiga, M.H., 2024. Matryoshka policy gradient for entropy-regularized RL: Convergence and global optimality. Journal of Machine Learning Research, 25(308), pp.1-52.
MR4829152
-
Lan, G., 2023. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes. Mathematical Programming, 198, no.1, Ser. A, 1059-1106.
MR4550970
-
Li, G., Wei, Y., Chi, Y., Gu, Y. and Chen, Y., 2023. Softmax policy gradient methods can take exponential time to converge.
Mathematical Programming, 201, no. 1-2, Ser. A, 707--802
MR4620239
-
Li, Z., Liu, B., Yang, Z., Wang, Z. and Wang, M., 2023. Double duality: variational primal-dual policy optimization for constrained reinforcement learning. Journal of Machine Learning Research, 24(385), pp.1-43.
MR4720841
-
Schulman, J., Levine, S., Abbeel, P., Jordan, M. and Moritz, P., 2015, June. Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning, PMLR 37, pp. 1889-1897.
-
Shalev-Shwartz, S. and Ben-David, S., 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
-
Shalev-Shwartz, S., 2012. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2), pp.107-194.
-
Sutton, R.S. and Barto, A.G., 2018. Reinforcement learning: an introduction.. Second edition
Adapt. Comput. Mach. Learn.
MIT Press, Cambridge, MA.
MR3889951
-
Sutton, R.S., McAllester, D., Singh, S. and Mansour, Y., 1999. Policy gradient methods for reinforcement learning with function approximation. Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'99). MIT Press, Cambridge, MA, USA, 1057-1063.
-
Szepesvàri, C., 2022. Algorithms for reinforcement learning, reprint of the 2010 original,
Synthesis Lectures on Artificial Intelligence and Machine Learning, 9, Springer, Cham,
MR4647095
-
Teyssière, G., 2023. Review of "Lan, G., 2023. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes". Mathematical Reviews, American Mathematical Society.
-
Teyssière, G., 2024. Review of "Li, G., Wei, Y., Chi, Y., Gu, Y. and Chen, Y., 2023. Softmax policy gradient methods can take exponential time to converge." Mathematical Reviews, American Mathematical Society.
-
Williams, R.J., 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, pp.229-256.
-
Zhang, K., Koppel, A., Zhu, H. and Basar, T., 2020. Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6), pp.3586-3612.
MR4182900
Updated March 2025..