본문 바로가기

Enginius/Robotics

Q Learning for idiots

I've been studying IRL and Bayesian IRL for two days, and honestly I have no idea how they implemented it. So for setting up my mind, I implemented basic Q learning algorithm for discrete MDP. 

(http://mnemstudio.org/path-finding-q-learning-tutorial.htm)


For simplicity, we have only 6 states. 


Possible transition between states are illustrated as above. Transition in MDP must be modelled in '$|S| \times |A| \times |S|$' , however, in reinforcement learning setting this is not usually given. Here, we assume that once we make control, we achieve our purpose (no uncertainty in the next state given current state and action). 


The most important REWARD is given as bellows. 


We can see that between movable states rewards are given by 0 and otherwise -1 except for state number 6. This indicates that state 6 is the goal state where we want reach (or where the TREASURE is!). 


In Q learning, we aim to find the Q function or matrix whose size is '$|S| \times |A|$'. This basically indicates the price we get at certain state by doing certain action


For updating Q, we repeat following procedures:

1. Select init state

2. Generate trajectory

3. Update Q


And in update Q step we do;

 $$Q(s, a) = R(s, a) + \gamma \cdot \max_{a' \in A}[Q(s_{next}, a')].$$


In plain English, Q is updated by first adding reward R with next best Q we can found at the next state '$s_{next}$'. Simple, right? 


Anyway, with several updates, we get following Q matrix.


In MATLAB, we can implement this with several lines. 


Codes


'Enginius > Robotics' 카테고리의 다른 글

Robotics in Germany  (1) 2015.06.16
Dynamic Occupancy Recurrent Nwork (DORN)  (0) 2015.05.16
T-RO paper submission procedure  (0) 2015.04.24
소나 센서를 이용한 파이오니어 3DX 로봇 제어  (0) 2015.02.26
Gaussian process path  (0) 2015.01.21