Related work of LevOpt

Posted 2018. 2. 11. 01:37

Babes, Monica, Marivate, Vukosi N., Subramanian, Kaushik, and Littman, Michael L. Apprenticeship learning about multiple intentions. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011

By clustering the observed trajectories of demonstrators using existing inverse reinforcement learning algorithms. The intention of each demonstrator is inferred by using the clusters in the context of apprenticeship learning [Abbeel & Ng, 2004). 

이 논문이 거의 최초인 것 같은데 'extension of apprenticeship learning in which the learner is provided with unlabeled example trajectories generated from a number of possible reward functions'이다. 제안한 방법을 통해서 각 individual intention을 successfully infer를 하였다! 

제안한 방법이 참 재밌다. EM 알고리즘을 만들어서, alternates between calculating cluster-allocating probabilities of each trajectory in the E step and preforming exiting IRL in the M step. 

이 논문에서는 재가 한 것과 비슷하게 highway driving을 한다. Mode는 safe / student / demolition / nasty 이다. 물론 아주 unrealistic control을 한다. (action이 left, right, center의 세 개 뿐이다.)

이건 그냥 clustering만 하고 마는 것 같은데? 

Choi, Jaedeug and Kim, Kee-Eung. Nonparametric bayesian inverse reinforcement learning for multiple reward functions. In Advances in Neural Information Processing Systems 25 (2012)

Assumes that reward functions of different demonstrations are different from each other. 

기존의 Bayesian IRL을 Dirichlet process mixture model을 이용해서 확장을 한다. Thanks to the nature of the nonparametric prior, the number of reward function is appropriated estimated as well. 

Have to solve a reinforcement problem as a subroutine. 

Based on integrating the Dirichlet process mixture model into Bayesian inverse reinforcement learning where a sample-efficient sampling method is also presented using a Metropolis-Hastings sampling algorithm. 

이것도 clustering을 하는 것 같다. 

Dimitrakakis, Christos and Rothkopf, Constantin A. Bayesian multitask inverse reinforcement learning. In EWRL, volume 7188 of Lecture Notes in Computer Science, pp. 273?284. Springer, 2011

Formalize the problem as statistical preference elicitation using structured priors that model the relatedness of different policies of multiple demonstrators.

Hierarchical Bayesian modeling of a reward and policy function. 

It can be interpreted as inferring the motivations of different experts solving the same task. 

- 이 점이 내가 제안하는 거랑 조금 다른거 같다. 같은 task를 풀려고 하는 것은 아니다. 

"we have a first theoretically sound formalization of the problem of learning from multiple teachers who all try to solve the same problem, but which have different preferences for doing so."

Prior on the reward functions using a Dirichlet distribution. 

이건 애초에 같은 문제를 풀려고 하는데, 다른 'preference'가 있는 상황인 것 같다. 문제는 이 preference가 어떻게 정의되는가 인데... 

그리고 state가  5개인 엄청 간단한 문제에만 적용을 해봤다. 

However, the existing IRL methods that assume the demonstrations are collected from multiple experts with varying reward function. 게다가 정말 좋은거 하나를 '복원'할 방법이 없다. 우리 방법에서는 sparsity constraint가 그런 역할을 해준다. 

Matteo Pirotta and Marcello Restelli. Inverse reinforcement learning through policy gradient minimization. In AAAI, pages 1993-1999, 2016.

Direct RL problem을 풀지 않고, 새로운 IRL 방법론을 제안한다. 중요한 idea는 find the reward function that minimizes the gradient of a parametrized representation of the expert's policy where the reward function is a linear combination of predefined basis functions. 

Jonathan Ho, Jayesh K. Gupta, and Stefano Ermon. Model-free imitation learning with policy optimization. In ICML, volume 48 of JMLR Workshop and Conference Proceedings, pages 2760-2769., 2016.

TRPO에 사용되는 cost function을 조금 바꿨다. 

Proposed a model-free imitation learning algorithm by modifying the cost function of a policy gradient method. In particular, the cost function of an expert is assumed to be a linear combination of predefined feature functions. 

Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In NIPS, pages 4565-4573, 2016.

relationship between inverse reinforcement and generative adversarial networks 


'Enginius > Robotics' 카테고리의 다른 글

Planning and Decision-Making for Autonomous Vehicles  (0) 2018.03.05
RL for real robots  (0) 2018.02.12
Related work of LevOpt  (0) 2018.02.11
Robotics and AI infographics  (0) 2017.12.13
DMPL Reivews  (0) 2017.11.30
Policy Gradient Methods to Actor Critic Model  (0) 2017.04.04
Write your message and submit
« PREV : 1 : ··· : 9 : 10 : 11 : 12 : 13 : 14 : 15 : 16 : 17 : ··· : 64 : NEXT »