Coherent Inference on Optimal Play in Game Trees
2010
Conference Paper
ei
pn
Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
Author(s): | Hennig, P. and Stern, D. and Graepel, T. |
Book Title: | JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010 |
Pages: | 326-333 |
Year: | 2010 |
Month: | May |
Day: | 0 |
Editors: | Teh, Y.W. , M. Titterington |
Publisher: | JMLR |
Department(s): | Empirical Inference, Probabilistic Numerics |
Bibtex Type: | Conference Paper (inproceedings) |
Event Name: | Thirteenth International Conference on Artificial Intelligence and Statistics |
Event Place: | Chia Laguna Resort, Italy |
Address: | Cambridge, MA, USA |
Digital: | 0 |
Links: |
PDF
Web |
BibTex @inproceedings{HennigSG2010, title = {Coherent Inference on Optimal Play in Game Trees}, author = {Hennig, P. and Stern, D. and Graepel, T.}, booktitle = {JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010}, pages = {326-333}, editors = {Teh, Y.W. , M. Titterington }, publisher = {JMLR}, address = {Cambridge, MA, USA}, month = may, year = {2010}, doi = {}, month_numeric = {5} } |