기존에 interpretation 관련 논문들

블로그: http://blog.qure.ai/notes/deep-learning-visualization-gradient-based-methods

1. Deep inside convolutional networks: Visualising image classification models and saliency maps, ArXiv, 2013

2. Learning Important Features Through Propagating Activation Differences, ArXiv, 2017

3. CAM: Learning deep features for discriminative localization, CVPR 2016

4. GradCAM: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, ICCV 2017

5. Interpretable Explanations of Black Boxes by Meaningful Perturbation, ArXiv, 2017

이번 NIPS2017에서 열린 해석 가능한 ML워크숍

http://www.interpretable-ml.org/nips2017workshop/

언제 다 정리하나..

__1. Exploring the definition of art through deep net visualization__

http://www.interpretable-ml.org/nips2017workshop/papers/01.pdf

We propose that a broad subset of visual art can be defined as **patterns that are exciting to a visual brain**. Resting on the finding that artificial neural networks trained on visual tasks can provide predictive models of processing in the visual cortex, our definition is operationalized by using a trained deep net as a surrogate “**visual brain**”, where “**exciting**” is defined as the activation energy of particular layers of this net.

__2. A unified view of gradient-based attribution methods for Deep Neural Networks__

http://www.interpretable-ml.org/nips2017workshop/papers/02.pdf

Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only a few attempts to analyze them from a theoretical perspective have been made in the past. In this work, we analyze **various state-of-the-art attribution methods** and prove** unexplored connections between them**. We also show how some methods can be reformulated and more conveniently implemented. Finally, we perform an empirical evaluation with six attribution methods on a variety of tasks and architectures and discuss their strengths and limitations.

__3. An Introduction to Deep Visual Explanation__

http://www.interpretable-ml.org/nips2017workshop/papers/03.pdf

While we recognize that explanations will have many different representations (e.g., image components, language segments, speech segments, etc.), our demonstration here is intended to be simple and preliminary, to illustrate the idea. Our immediate goal is to create an explanation about the outcome of a DCNN, i.e.,** to identify which discriminative pixels in the image influence the final prediction** (see Figure 1.)

**4. Learning Explainable Embeddings for Deep Networks**

http://www.interpretable-ml.org/nips2017workshop/papers/04.pdf

We propose** a novel explanation module** to explain the predictions made by deep learning. **Explanation module **works by __ embedding a high-dimensional deep network layer nonlinearly into a low-dimensional explanation space __while retaining faithfulness so that the original deep learning predictions can be constructed from the few concepts extracted by the explanation module. We then

**visualize such concepts**for human to learn about the high-level concepts that deep learning is using to make decisions. We propose

**Sparse Reconstruction Autoencoder (SRAE)**for learning the embedding to the explanation space. SRAE aims to reconstruct part of the original feature space while retaining faithfulness. The proposed method is applied to explain CNN models in image classification tasks, and several novel metrics are introduced to evaluate the performance of explanations quantitatively without human involvement. Experiments show that the proposed approach could generate better explanations of the mechanisms CNN to use for making predictions in the task.

__5. Separable explanations of neural network decisions__

http://www.interpretable-ml.org/nips2017workshop/papers/05.pdf

**Deep Taylor Decomposition** is a method used to explain neural network decisions. When applying this method to non-dominant classifications, the resulting explanation does not reflect important features for the chosen classification. We propose that this is caused by the dense layers and propose a method to alleviate the effect by applying regularization. We assess the result by measuring the quality of the resulting explanations objectively and subjectively.

__6. Visualizing and Understanding Atari Agents__

http://www.interpretable-ml.org/nips2017workshop/papers/06.pdf

In this paper, we addressed the growing need for **human-interpretable explanations of deep RL agents** by introducing a new saliency method and using it to visualize and understand Atari agents. We found that **our saliency method** can yield effective visualizations for a variety of Atari agents. We also found that these visualizations can help non-experts understand what deep RL agents are doing. Finally, we obtained preliminary results for the role of memory in these policies. Understanding deep RL agents is difficult because they are black boxes that can learn nuanced and unexpected strategies. To produce explanations that satisfy human users, researchers will need to use not one, but many techniques for extracting the "how" and "why" from these agents. This work complements previous efforts, taking the field a step closer to producing truly satisfying explanations.

__7. Grammatically-Interpretable Learned Representations in Deep NLP Models__

http://www.interpretable-ml.org/nips2017workshop/papers/07.pdf

We introduced two architectures inspired by Tensor Product Representations (TPRs) — TPRN and TPGN — and evaluated these models on the important NLP tasks of machine reading comprehension and image-to-language generation. Our results show that compared to the widely adopted LSTM-based architecture, the proposed models demonstrate significant grammatical interpretability, with on-par or better performance on these challenging tasks

__8. Neural Interaction Detection__

http://www.interpretable-ml.org/nips2017workshop/papers/09.pdf

We develop a method of detecting statistical interactions in data by interpreting the trained weights of a feedforward multilayer neural network. With sparsity regularization applied to the weights, our method can achieve high interaction detection performance without searching an exponential solution space of possible interactions. We obtain our computational savings by first observing that interactions between input features are created by the non-additive effect of nonlinear activation functions, and that interacting paths are encoded in weight matrices. We use these observations to develop a way of identifying both pairwise and higher-order interactions with a simple traversal over the input weight matrix. In experiments on simulated and real-world data, we demonstrate the performance of our method and the importance of discovered interactions.

__9. The (Un)reliability of saliency methods__

http://www.interpretable-ml.org/nips2017workshop/papers/10.pdf

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step —adding a constant shift to the input data— to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

__11. Relating Input Concepts to Convolutional Neural Network Decisions__

http://www.interpretable-ml.org/nips2017workshop/papers/11.pdf

Many current methods to interpret convolutional neural networks (CNNs) use visualization techniques and words to highlight concepts of the input seemingly relevant to a CNN’s decision. The methods hypothesize that the recognition of these concepts are instrumental in the decision a CNN reaches, but the nature of this relationship has not been well explored. To address this gap, this paper examines the **quality of a concept’s recognition by a CNN and the degree to which the recognitions are associated with CNN decisions**. The study considers a CNN trained for scene recognition over the ADE20k dataset. It uses a novel approach to find and score the strength of minimally distributed representations of input concepts (defined by objects in scene images) across late stage feature maps. Subsequent analysis finds evidence that concept recognition impacts decision making. Strong recognition of concepts frequently-occurring in few scenes are indicative of correct decisions, but recognizing concepts common to many scenes may mislead the network.

__13. Interpreting Neural Network Classifications with Variational Dropout Saliency Maps__

http://www.interpretable-ml.org/nips2017workshop/papers/13.pdf

Deep neural networks are effective at classification across many domains, but they are also **opaque in the sense of being seen as “black boxes”** even when the training data and model architecture are available. Saliency maps are a tool for interpreting neural network classification that, given a particular input example and output class, score the relevance of each input dimension to the resulting classification. Recent work defined salient inputs as those yielding the greatest change in the classification output when replaced with some reference value; this value is chosen heuristically (e.g., background color, Gaussian noise, or blurred version of the input) and thus biases the saliency computation. We generalize this approach by extending the notions of ‘replacement’ and ‘reference’: first we cast the input replacement in a dropout framework and use variational inference to learn a distribution over dropped-out (replaced) inputs from the data; then we express the reference value as the output of a generative model, which can be learned from data to mitigate the effect of the reference value biasing the saliency map. We then propose a new model-agnostic saliency map that uses both extensions in tandem. We show the resulting saliency maps for a digit classification network pre-trained on MNIST and compare our results against other methods both qualitatively and quantitatively

#### 'Enginius > Machine Learning' 카테고리의 다른 글

Deconvolution and Checkerboard Artifacts (0) | 2017.12.22 |
---|---|

CapsNet (0) | 2017.12.19 |

NIPS 2017 - Interpretable ML workshop (0) | 2017.12.19 |

On Bayesian Deep Learning and Deep Bayesian Learning (0) | 2017.12.18 |

Google Developer Expert (0) | 2017.12.11 |

Graphical model (0) | 2017.12.06 |