For reference, we have compiled a list of some important publications on the topic of structure and priors in reinforcement learning (RL). Please make a pull request at the spirl-readings repository or email us at firstname.lastname@example.org if there’s relevant work that could be added to the list!
Thanks to Michael Janner for contributing!
- Julia Trommershäuser, Laurence T Maloney, and Michael S Landy, “Decision Making, Movement Planning and Statistical Decision Theory,” Trends in Cognitive Sciences 12, no. 8 (2008): 291–297, https://www.sciencedirect.com/science/article/pii/S1364661308001538.
- Carlos Diuk et al., “Divide and Conquer: Hierarchical Reinforcement Learning and Task Decomposition in Humans,” in Computational and Robotic Models of the Hierarchical Organization of Behavior (Springer, 2013), 271–291, https://link.springer.com/chapter/10.1007%2F978-3-642-39875-9_12.
- Alec Solway et al., “Optimal Behavioral Hierarchy,” PLoS Computational Biology 10, no. 8 (2014): e1003779, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003779.
- Y-Lan Boureau, Peter Sokol-Hessner, and Nathaniel D Daw, “Deciding How to Decide: Self-Control and Meta-Decision Making,” Trends in Cognitive Sciences 19, no. 11 (2015): 700–710, https://www.sciencedirect.com/science/article/pii/S1364661315002041.
- Samuel J. Gershman and Yael Niv, “Novelty and Inductive Generalization in Human Reinforcement Learning,” Topics in Cognitive Science 7, no. 3 (2015): 391–415, https://onlinelibrary.wiley.com/doi/full/10.1111/tops.12138.
- Falk Lieder and Thomas L Griffiths, “Strategy Selection as Rational Metareasoning.” Psychological Review 124, no. 6 (2017): 762, http://cocosci.princeton.edu/falk/Strategy%20selection%20as%20rational%20metareasoning.pdf.
- Ida Momennejad et al., “The Successor Representation in Human Reinforcement Learning,” Nature Human Behaviour 1, no. 9 (2017): 680, https://www.nature.com/articles/s41562-017-0180-8.
- Rachit Dubey et al., “Investigating Human Priors for Playing Video Games,” in ICML, 2018, https://arxiv.org/abs/1802.10217.
- George Konidaris, “On the Necessity of Abstraction,” Current Opinion in Behavioral Sciences 29 (2019): 1–7, https://www.sciencedirect.com/science/article/pii/S2352154618302080.
- Wolfram Schultz, Peter Dayan, and P Read Montague, “A Neural Substrate of Prediction and Reward,” Science 275, no. 5306 (1997): 1593–1599, http://science.sciencemag.org/content/275/5306/1593.
- Matthew M Botvinick, Yael Niv, and Andrew C Barto, “Hierarchically Organized Behavior and Its Neural Foundations: A Reinforcement Learning Perspective,” Cognition 113, no. 3 (2009): 262–280, https://www.ncbi.nlm.nih.gov/pubmed/18926527.
- Jose JF Ribas-Fernandes et al., “A Neural Signature of Hierarchical Reinforcement Learning,” Neuron 71, no. 2 (2011): 370–379, https://www.ncbi.nlm.nih.gov/pubmed/21791294.
- Samuel J Gershman, “The Successor Representation: Its Computational Logic and Neural Substrates,” Journal of Neuroscience 38, no. 33 (2018): 7193–7200, http://www.jneurosci.org/content/38/33/7193.
- Peter Dayan and Geoffrey E. Hinton, “Feudal Reinforcement Learning,” in NeurIPS, 1992, http://papers.nips.cc/paper/714-feudal-reinforcement-learning.
- Richard S Sutton, Doina Precup, and Satinder Singh, “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,” Artificial Intelligence 112, nos. 1-2 (1999): 181–211, https://www.sciencedirect.com/science/article/pii/S0004370299000521.
- Ronald Parr and Stuart J Russell, “Reinforcement Learning with Hierarchies of Machines,” in NeurIPS, 1997, https://papers.nips.cc/paper/1384-reinforcement-learning-with-hierarchies-of-machines.
- Thomas G Dietterich, “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” Journal of Artificial Intelligence Research 13 (2000): 227–303, https://arxiv.org/abs/cs/9905014.
- Kfir Y Levy and Nahum Shimkin, “Unified Inter and Intra Options Learning Using Policy Gradient Methods,” in European Workshop on Reinforcement Learning (Springer, 2011), https://ewrl.files.wordpress.com/2011/08/ewrl2011_submission_21.pdf.
- Pierre-Luc Bacon, Jean Harb, and Doina Precup, “The Option-Critic Architecture,” in AAAI, 2017, http://arxiv.org/abs/1609.05140.
- Alexander Sasha Vezhnevets et al., “FeUdal Networks for Hierarchical Reinforcement Learning,” in ICML, 2017, https://arxiv.org/abs/1703.01161.
- Yan Duan et al., “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning,” arXiv Preprint arXiv:1611.02779 (2016), http://arxiv.org/abs/1611.02779.
- Jane X. Wang et al., “Learning to Reinforcement Learn,” CoRR abs/1611.05763 (2016), http://arxiv.org/abs/1611.05763.
- Rocky Duan, “Meta-Learning for Control,” 2017, https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-233.pdf.
- Chelsea Finn, Pieter Abbeel, and Sergey Levine, “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” in ICML, 2017, https://arxiv.org/abs/1703.03400.
- Kevin Frans et al., “Meta-Learning Shared Hierarchies,” in ICLR, 2018, https://arxiv.org/abs/1710.09767.
- Abhishek Gupta et al., “Meta-Reinforcement Learning of Structured Exploration Strategies” (2018), https://arxiv.org/abs/1802.07245.
- Steindór Sæmundsson, Katja Hofmann, and Marc Peter Deisenroth, “Meta Reinforcement Learning with Latent Variable Gaussian Processes,” in UAI, 2018, https://arxiv.org/abs/1803.07551.
Modularity in RL
- Satinder P. Singh, “Transfer of Learning by Composing Solutions of Elemental Sequential Tasks,” Machine Learning 8 (1992): 323–339, https://link.springer.com/article/10.1007/BF00992700.
- Nicolas Heess et al., “Learning and Transfer of Modulated Locomotor Controllers,” arXiv Preprint arXiv:1610.05182 (2016), https://arxiv.org/abs/1610.05182.
- Jacob Andreas, Dan Klein, and Sergey Levine, “Modular Multitask Reinforcement Learning with Policy Sketches,” in ICML, 2017, https://arxiv.org/abs/1611.01796.
- Coline Devin et al., “Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer,” in ICRA, 2017, https://arxiv.org/abs/1609.07088.
- Karol Hausman et al., “Multi-Modal Imitation Learning from Unstructured Demonstrations Using Generative Adversarial Nets,” in NeurIPS, 2017, https://arxiv.org/abs/1705.10479.
- Dibya Ghosh et al., “Divide-and-Conquer Reinforcement Learning,” in ICLR, 2018, https://arxiv.org/abs/1711.09874.
- Karol Hausman et al., “Learning an Embedding Space for Transferable Robot Skills,” in ICLR, 2018, https://openreview.net/forum?id=rk07ZXZRb.
- Tobias Johannink et al., “Residual Reinforcement Learning for Robot Control,” arXiv Preprint arXiv:1812.03201 (2018), https://arxiv.org/abs/1812.03201.
- Michael B. Chang et al., “Automatically Composing Representation Transformations as a Means for Generalization,” in ICLR, 2019, https://arxiv.org/abs/1807.04640.
Priors and Bayesian RL
- David Wingate et al., “Bayesian Policy Search with Policy Priors,” in IJCAI, 2011, http://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/download/3306/3478.
- Mohammad Ghavamzadeh et al., “Bayesian Reinforcement Learning: A Survey,” Foundations and Trends in Machine Learning 8, nos. 5-6 (2015): 359–483, https://arxiv.org/abs/1609.04436.
- Ian Osband, John Aslanides, and Albin Cassirer, “Randomized Prior Functions for Deep Reinforcement Learning,” in NeurIPS, 2018, https://arxiv.org/abs/1806.03335.
Structure in RL
- Sebastian Thrun and Anton Schwartz, “Finding Structure in Reinforcement Learning,” in NeurIPS, 1994, https://papers.nips.cc/paper/887-finding-structure-in-reinforcement-learning.
- Richard S Sutton, “TD Models: Modeling the World at a Mixture of Time Scales,” in ICML, 1995, https://www.sciencedirect.com/science/article/pii/B9781558603776500724.
- Michael L. Littman, Richard S. Sutton, and Satinder P. Singh, “Predictive Representations of State,” in NeurIPS, 2001, https://papers.nips.cc/paper/1983-predictive-representations-of-state.
- Marc Ponsen, Matthew E Taylor, and Karl Tuyls, “Abstraction and Generalization in Reinforcement Learning: A Summary and Framework,” in International Workshop on Adaptive and Learning Agents (Springer, 2009), 1–32, https://link.springer.com/chapter/10.1007/978-3-642-11814-2_1.
- Richard S Sutton et al., “Horde: A Scalable Real-Time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction,” in AAMAS, 2011, http://www.ifaamas.org/Proceedings/aamas2011/papers/A6_R70.pdf.
- Tom Schaul et al., “Universal Value Function Approximators,” in ICML, 2015, http://proceedings.mlr.press/v37/schaul15.html.
- Aviv Tamar et al., “Value Iteration Networks,” in NeurIPS, 2016, https://arxiv.org/abs/1602.02867.
- David Silver et al., “The Predictron: End-to-End Learning and Planning,” in ICML, 2017, https://arxiv.org/abs/1612.08810.
- Jungseul Ok, Alexandre Proutiere, and Damianos Tranos, “Exploration in Structured Reinforcement Learning,” in NeurIPS, 2018, https://arxiv.org/abs/1806.00775.
- Yaroslav Ganin et al., “Synthesizing Programs for Images Using Reinforced Adversarial Learning,” in ICML, 2018, https://arxiv.org/abs/1804.01118.
- Alvaro Sanchez-Gonzalez et al., “Graph Networks as Learnable Physics Engines for Inference and Control,” in ICML, 2018, https://arxiv.org/abs/1806.01242.
Transfer, Multi-Task and Lifelong RL
- Matthew E Taylor and Peter Stone, “Transfer Learning for Reinforcement Learning Domains: A Survey,” JMLR 10, no. Jul (2009): 1633–1685, http://www.jmlr.org/papers/v10/taylor09a.html.
- David Abel et al., “Policy and Value Transfer in Lifelong Reinforcement Learning,” in ICML, 2018, http://proceedings.mlr.press/v80/abel18b.html.