Q Learning for idiots

I've been studying IRL and Bayesian IRL for two days, and honestly I have no idea how they implemented it. So for setting up my mind, I implemented basic Q learning algorithm for discrete MDP.

(http://mnemstudio.org/path-finding-q-learning-tutorial.htm)

For simplicity, we have only 6 states.

Possible transition between states are illustrated as above. Transition in MDP must be modelled in '$|S| \times |A| \times |S|$' , however, in reinforcement learning setting this is not usually given. Here, we assume that once we make control, we achieve our purpose (no uncertainty in the next state given current state and action).

The most important REWARD is given as bellows.

We can see that between movable states rewards are given by 0 and otherwise -1 except for state number 6. This indicates that state 6 is the goal state where we want reach (or where the TREASURE is!).

In Q learning, we aim to find the Q function or matrix whose size is '$|S| \times |A|$'. This basically indicates the price we get at certain state by doing certain action.

For updating Q, we repeat following procedures:

1. Select init state

2. Generate trajectory

3. Update Q

And in update Q step we do;

$$Q(s, a) = R(s, a) + \gamma \cdot \max_{a' \in A}[Q(s_{next}, a')].$$

In plain English, Q is updated by first adding reward R with next best Q we can found at the next state '$s_{next}$'. Simple, right?

Anyway, with several updates, we get following Q matrix.

In MATLAB, we can implement this with several lines.

Codes

'Enginius > Robotics' 카테고리의 다른 글

Robotics in Germany (1)	2015.06.16
Dynamic Occupancy Recurrent Nwork (DORN) (0)	2015.05.16
T-RO paper submission procedure (0)	2015.04.24
소나 센서를 이용한 파이오니어 3DX 로봇 제어 (0)	2015.02.26
Gaussian process path (0)	2015.01.21

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Mad for Simplicity

Q Learning for idiots

'Enginius > Robotics' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

Q Learning for idiots

'Enginius > Robotics' 카테고리의 다른 글

'Enginius/Robotics' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역