본문 바로가기

Thoughts/Technical Writing

Another Reject from ICRA 2017

Another counter-example collected.. 



Review 1 (BAD, BAD, BAD)

This paper proposes intention aware apprenticeship learning (IAAL) for planning actions in cooperative human-robot settings. This addresses the dual problem of understanding human intention, and then planning actions based on this. A Gaussian process model is used for intention inference, once the different demonstrations have been clustered, and then the robot reward function factors in this intention

for optimisation. The method is demonstrated in simulation and on a real robot.


This paper is generally not well structured, and difficult to read. The main difficulty is that a number of different contributions are made, but there is no clear indication of the individual value of any of them. 

: Focus on few contributions !!

For example, I would like to have seen comparisons of the proposed similarity metric to others, to evaluate whether or not this is necessary. Similarly, how would the reward learning algorithm work when combined with a different intention inference step (e.g. just naively matching to the best imitation trajectory), and vice versa? Experiments would have been useful in this regard to better motivate the use of each component. The paper also suffers from awkward grammar throughout, and could do with some additional proofreading.


The definition of intention as a flow function/velocity seems very simplistic. Why would this be intention

inference, rather than activity recognition? See, e.g., Karpathy, A., Toderici, G., Shetty, S., Leung,

T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In

Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732).


A major argument for the proposed method is that it can add constraints such as path smoothing and obstacle avoidance to the demonstrated trajectories. The provided example of this involves specifying an ad hoc obstacle avoidance function, which seems rather unsatisfying. It isn't clear that other methods can't do this either. For example, traditional apprenticeship learning would learn a policy that automatically avoids obstacles, using inverse  reinforcement learning, e.g., Abbeel, P., & Ng, A. Y. (2004, July). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (p. 1). ACM. Ranchod, P., Rosman, B., & Konidaris, G. (2015, September). Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on (pp. 471-477). IEEE.


The GPIM is only trained on the human motions, but these may be contingent on the robot motions

(This part definitely needs more explanation!)

, in that they may adapt based on what the robot is doing. How would you take this into account?

See, e.g., Dragan, A. D., Bauman, S., Forlizzi, J., & Srinivasa, S. S. (2015, March). Effects of robot motion on

human-robot collaboration. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (pp. 51-58). ACM.


The training in IV-B (and indeed the description of the method) refers specifically to the example interaction. Please emphasise that this is an example, and rather focus on a generalised presentation of the main ideas (e.g. not referring to a specific number of clusters).


Downscaling demonstrations, manifold mapping: Ndivhuwo


In the experiments, the manipulator is described as replanning every two seconds. As a comparison, how would the manipulator do at the task if it merely replanned every two seconds without taking the human intention into account. This could provide a convincing motivation for the proposed method.


Minor points:

In Fig 2b, the dark diagonal seems odd. Shouldn't these be the best matches, with the "blocks" around them being a similar colour? Also, a colour bar should be provided to indicate values.

Are the plots in Fig 3b just a slice of the full (x,y,z) volume for one particular value of z? Say so, and give that value.

In Fig 4, one cannot make out if these trajectories are good or bad. This only seems to indicate that the robot can reach the desired end point.

Fig 9 is unclear, as the subplots are too small to make out the trajectories.

The paper mentions that dealing with whole-body motion would be infeasible. How might you extend to these cases?



Review 2 (BAD, BAD, BAD)

The paper presents an apprenticeship learning approach which combines dynamics matching reward learning, gaussian random paths representation, a state dependent similarity metric, a Gaussian  process learning method, and Gaussian random path based optimisation. The above methods are presented clearly and the overall quality of the writing is very high.


The motivation and applicability of the this work not clear. All experiments and discussions consider only

end-effector trajectories (position only). This is a well studied problem and there exists a large number of

techniques for learning and clustering trajectories in 3D. It is not clear how IAAL (or MIP) performs with respect to other 3D trajectory optimisation techniques. But more importantly, this problem not very relevant for robotics application with 6+DoF. How are redundancies being modelled and exploited? How would this technique scale to higher-DoF systems? Where are the bottlenecks and limitations? The way IAAL is presented here only shows that it can be used for learning reaching trajectories. Is this really its only apllication area?


IAAL does not require time alignment of trajectories because a state dependent similarity metric is used (as opposed to a time dependent one). This may be viewed as an advantage, especially when dealing with 3D end-effector reaching trajectories shown in the experimental section. This is a very limited set of representable trajectories though. This similarity function cannot be used for learning trajectories that are inherently time dependent (e.g. when different velocities are required for the same configuration position at different points in time -crossing point of figure eight motion). What is limit on complexity when learning generic tasks? What is the scope of tasks you can learn? How will this affect tasks in higher dimensional state spaces?


What is the goodness criteria for the clustering technique described in III. B? Figure 5.a suggests that parameter tuning is required. How do you tune the parameters? Is expert knowledge required or can this be automated?


In Figure 5.a, the accuracy of the proposed method decreases with the number of clusters but the accuracy of MIP increases. Why? Shouldn't MIP be affected by the same problem of sparsity of data per cluster/parameter?


IAAL performs better with a smaller amount of observed human trajectory (Figure 5.b). When does this break down? How much ambiguity is there in the first 20% of the trajectory (w.r.t. to the used similarity metric)? Will this trend persist when the number of clusters increases (assuming each cluster will be trained on enough data), in other words, when do the trajectories start looking so similar that they can't be reliably classified?


In 1st paragraph of section V.B, the last sentence in left text column does not make sense grammatically. 



Review 3 (BAD, BAD, BAD)

Summary

----------------------------------------------------------------------

This paper describes an apprenticeship learning method that can take advantage of hierarchical structure in the demonstrators actions (e.g., learning to recognize the difference between a place action or a pick). They define intention as the policy that maps position to velocity. Given an observed trajectory, they fit a GP to the position--velocity pairs, and use likelihood under this GP to define a similarity metric. The authors cluster trajectories with this metric, and then fit a GP to each cluster and use that to do classification online. They use IRL to infer a reward function per cluster and use a sample based optimization to optimize robot trajectories online. They evaluate this to show that their approach out-performs mixtures of interacting primitives wrt RMS error. They show that their approach is able to respond to a changing intention in a robot handover experiment.


Overall Comments

----------------------------------------------------------------------

I liked the modelling approach taken in this paper. The key step beyond prior work in my view is that the similarity metrics account for a policy on the human's part and so trajectory similarity is defined as policy similarity. I also like that the author explicitly extract a reward function for each action. However, there are some issues that author should address before this work is ready for publication.


Clarify the contributions being claimed.


In particular, the authors need to make it clear that VB is not a contribution of this work --- or the authors need to compare their algorithm with alternative trajectory optimization approaches. Either way, there are several optimizations approaches that this are very similar (for example, how does this compare with cross-entropy maximization [1]).


Include better coverage of related works.


The paper does not include a related works section. This is important for readers to be able to place the contributions claimed and the approaches used in context. For example, this approach is very similar

to other work in IRL and behavioral cloning (e.g., [2, 3, 4]) but this work is uncited.


More detailed and rigorous experimental evaluation.


The experimental validation here can be strengthened signifigantly. First, I think there can be a better

description of the procedure used to collect trajectories, I don't think there's enough here for someone to replicate the work. For example, what is the environment used? All I found was that "a human and a robot are collaboratively arranging the space in a close proximity." What objects were used? What are the types of obstructions? What tools were used to collect trajectories?


The results presented do not support the conclusions reached. For example, the authors claim "IAAL shows superior performance when the number of clusters is less than 4", but inspection of 5a shows that the error bars for RMS performance overlap in each condition.


The comparison to MIP on object avoidance is not very strong, as it compares to an algorithm that is oblivious to MIP. If the authors want to claim that their approach improves on the state-of-the-art, they need to compare to a reasonable baseline; no one would try to use MIP to do obstacle avoidance, so the evidence here only supports the uncontroversial claim "it is a good idea to account for obstacle avoidance."


[1] Rubinstein, R.Y., Kroese, D.P. (2004). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York.


[2] Levine, Sergey, Zoran Popovic, and Vladlen Koltun. "Nonlinear inverse reinforcement learning with gaussian processes." Advances in Neural Information Processing Systems. 2011.


[3] Englert, Peter, et al. "Model-based imitation learning by probabilistic trajectory matching." Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013.


[4] Cederborg, Thomas, et al. "Incremental local online gaussian mixture regression for imitation learning of multiple tasks." Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on. IEEE, 2010.

 





'Thoughts > Technical Writing' 카테고리의 다른 글

Reviews I got from IROS 2017  (0) 2017.06.27
Reviews I got from RSS 2017  (0) 2017.05.10
Robotics paper  (0) 2015.02.16
호프스테더  (1) 2014.11.10
노인과 로봇  (3) 2014.03.13