# NIPS review

Posted 2018.07.31 04:53**Review 1**

This paper proposes a generative mixture model that seeks to robustly capture the true data generation distribution from noisy data. The crux of the approach is to enforce a correlation among the weights of different density networks by exploiting the idea of Cholesky transformation. On both synthetic and real datasets, it shows that the proposed approach can robustly learns the target distribution even when the ratio of noisy labels becomes very high.

strength

The paper is mostly clear.

The overall idea is interesting and well motivated.

The discussions with related works are thorough. The experiments in noisy data look promising.

weakness

Section 3.1 looks a little verbose and is less motivated by practical meaning. It will be more clear if it can first briefly introduce the background of Cholesky transformation. After all, the paper mainly tries to exploit this theory, and neural network is only one way to parameterize the density component.

While many related works were surveyed, the most recent works that explicitly seek to tackle the noisy label issue should be compared. For example, reference 31 and 35 also have improved performance even when the labels are very noisy. Without such direct comparison, it is not very clear to me about how the proposed approach can stand out.

Furthermore, more ablation analysis are needed to investigate the effectiveness of the the hyper parameter K and \tau

Typo: page 2: “each demonstrations -> each demonstration”

**Review 2**

This paper introduces a new model architecture dubbed ChoiceNet, which aims to robustly learn a target distribution given noisy training data. On top of a base network, the architecture utilizes a so-called mixture of correlated density network block, which predicts correlated mean vectors. The architecture is evaluated on several tasks with varying types of data corruption and it outperforms relevant baselines.

Real-world data is always noisy but we generally do not have a full understanding of the noise sources. As a result, generic methods that can determine data quality are important in building robust models, and for this reason I think this paper is salient and makes it a valuable contribution to the NIPS community.

The main technical innovation relative to prior work seems to be the MCDN block, which generates correlated mean vectors. While the underlying mechanism is simple, the result seems powerful, given the empirical results from Section 4. The details of the derivation seem correct, but I did not check them carefully. As a minor comment regarding presentation, I would recommend presenting a single "Theorem", i.e. Theorem 3, and renaming Theorems 1 and 2 to Lemmas or incorporating them into Theorem 3 itself. Theorems 1 and 2 are pretty trivial and may not even need to be stated formally.

Some additional commentary in Section 3.3 about the loss functions for regression and classification could be useful. For example, why is it that only \mu_1 appears in eqn. (4)?

The experiments seem fairly comprehensive with regard to the performance of ChoiceNet relative to other baselines. I would have additionally appreciated some experiments or analysis into how the model is performing well, i.e. delving into the internal mechanics that yield its good performance. Is it working like it was designed, and how could this be tested?

Overall, I think this is a nice paper and I recommend acceptance.

**Review 3**

The paper addresses the problem of training robust supervised models in the presence of target output noise. Both regression and classification are considered. The proposed model estimates the target distribution using a mixture model designed specifically for capturing correlations between the components. Experiments are performed on regression and classification problems.

The objective is clearly described but the method is not. No intuition is provided. The technical description (sections 3.2 and 3.3) is not understandable. Experiments show good results compared to several baselines. Then, there might be some interesting features in the proposed method, but this is not decipherable from this paper. Besides I think that not all the comparison are meaningful.

#### 'Enginius > Machine Learning' 카테고리의 다른 글

VAE에 대한 잡설 (2) | 2018.08.03 |
---|---|

NIPS review (0) | 2018.07.31 |

2018년 기존 최신 논문들 (RL 위주) (1) | 2018.04.07 |

Recent papers regarding robust learning with noisy labels (0) | 2018.03.26 |

Causalty란 무엇일까? (0) | 2017.12.30 |

Back-propagation (delta rule) (0) | 2017.12.24 |

- Filed under : Enginius/Machine Learning
- Comment Trackback