# MuJoCo Ant-V2 + PPO implementation

Posted 2018. 7. 3. 04:04

Some descriptions of Ant-V2 in MuJoCo

Code snippet for random movements of the Ant.

import gym
import numpy as np
import tensorflow as tf

sess = tf.Session()

env = gym.make('Ant-v2')
obs = env.reset()
obs_dim = env.observation_space.shape[0]
act_dim = env.action_space.shape[0]

print (obs_dim,act_dim)

for i in range(1000):
env.render()
action = np.random.randn(act_dim,1)
action = action.reshape((1,-1)).astype(np.float32)
obs, reward, done, _ = env.step(np.squeeze(action, axis=0))

You'll see something like this:

The four-legged (quadruped) thing is called an Ant. It has 111-dim observational space and 8-dim action space. Very interestingly, the physical implications of these spaces were hidden inside the ant.xml file which looks like:

This format should be familiar with roboticists who use ROS often, but it was never the case with me, so I googled again to come up with this nice explanation in https://github.com/openai/gym/issues/585.

정리하자면 다음과 같다.

111-dim observation space

z (height) of the Torso -> 1

orientation (quarternion x,y,z,w) of the Torso -> 4

8 Joiint angles -> 8

3-dim directional velocity and 3-dim angular velocity -> 3+3=6

8 Joint velocity -> 8

External forces (force x,y,z + torque x,y,z) applied to the CoM of each link (Ant has 14 links: ground+torso+12(3links for 4legs) for legs -> (3+3)*(14)=84

=> 1+4+8+6+8+84 = 111

8-dim action space

Torque (value for -1 to +1) of each joint

Reward structure

Make a four-legged creature walk forward as fast as possible.

Ant-v1 defines "solving" as getting an average reward of 6000.0 over 100 consecutive trials.

xposbefore = self.get_body_com("torso")[0]
self.do_simulation(a, self.frame_skip)
xposafter = self.get_body_com("torso")[0]
forward_reward = (xposafter - xposbefore)/self.dt
ctrl_cost = .5 * np.square(a).sum()
contact_cost = 0.5 * 1e-3 * np.sum(
np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))
survive_reward = 1.0
reward = forward_reward - ctrl_cost - contact_cost + survive_reward

: x-directional displacement - sum of squred torques - contact penalty + ..constant?

Random thoughts

- Torque-controlled actuators in this small form-factor? I wish I have one too.

- IMU at every joint + force sensors at every link is not likely to be possible.

- Planner position (x,y) is NOT used. Interesting.

- Do we really need this huge state representation?

- Contact model can never be accurate

- What happens if we modify the frictions.

_____________________________________________________

#### 'Enginius > Python&TensorFlow' 카테고리의 다른 글

 Install PyTorch for LCRnet using KB  (0) 2019.04.30 2018.09.06 2018.07.03 2018.07.01 2018.06.22 2018.02.22