본문 바로가기

Enginius/Machine Learning

Total variational distance between two empirical distributions (KDE)

1. Computing empirical distribution 


For simplicity, we will assume that the data lies in two dimensional space with x1 and x2 axes. 

ndata1 = 100;

ndata2 = 200;

data1 = repmat([1 2], ndata1, 1) + randn(ndata1, 2);

data2 = repmat([2 1], ndata2, 1) + randn(ndata2, 2);

 

figure(1); hold on;

plot(data1(:, 1), data1(:, 2), 'ro');

plot(data2(:, 1), data2(:, 2), 'bx');

axis equal;



2. Kernel Density Estimation


To run KDE, you need a kernel function. 

function [K, dK] = kernel_se(X1, X2, L1, L2, hyp)

 

% Normalize X (This is optional)

% X1 = normr(X1);

% X2 = normr(X2);

 

n1 = size(X1, 1);

n2 = size(X2, 1);

d1 = size(X1, 2);

d2 = size(X2, 2);

if d1 ~= d2, fpinrtf(2, 'Data dimension missmatch! \n'); end

d  = d1;

 

beta  = hyp(end); % Gain

gamma = hyp(1:d); % Inverse Length Parameters 

 

% make sure 'gamma' stays positive

gamma = max(gamma, 1E-12*ones(size(gamma)));

 

% Compute Kernel Matrix

Gamma = diag(gamma);

temp1 = pdist2(X1, X2, 'mahalanobis', inv(Gamma)).^2;

 

K = beta*exp(-1/2*temp1) ...

    .*cos(pi/2*pdist2(L1, L2, 'cityblock')) ;

 

% Limit condition number

if n1 == n2 && n1 > 1

    sig = 1E-8*beta;

    K = K + sig*eye(size(K));

end

 

% Compute Kernel Derivates

if nargin > 1

    dK = zeros(n1, n2, d+1);

    % w.r.t. gamma 

    for i = 1:d

        A = -0.5*pdist2(X1(:, i), X2(:, i), 'euclidean').^2;

        dK(:, :, i) = A.*K;

    end

    % w.r.t. gain (beta)

    dK(:, :, d+1) = K/beta;

end



The code above is a leveraged kernel function that I will use. And, just ignore about the leverage part and the derivative part. 


We will further assume that the space will be discretized mainly for computing the total variational distance (we can hardly compute the integral analytically, so instead we discretize the space and compute the sum of absolute differences which is basically a total variational distance) 


To discretize the space, we will use meshgrid function to compute evenly spaced points in the space of our interest. Then we use the kernel function to compute KDE. 

x1min = -2; x1max = 5;

x2min = -2; x2max = 5;

x1res = 50; x2res = 50;

x1grid = linspace(x1min, x1max, x1res);

x2grid = linspace(x2min, x2max, x2res);

[x1mesh, x2mesh] = meshgrid(x1grid, x2grid);

xpnts = [x1mesh(:) x2mesh(:)];

npnts = x1res*x2res;

        

invlen = 1;

K1 = kernel_se(xpnts, data1(:, 1:2) ...

    , ones(npnts, 1), ones(ndata1, 1), [invlen invlen 1]);

z1 = sum(K1, 2); z1 = z1 / sum(z1);

Z1 = reshape(z1, x2res, x1res);

K2 = kernel_se(xpnts, data2(:, 1:2) ...

    , ones(npnts, 1), ones(ndata2, 1), [invlen invlen 1]);

z2 = sum(K2, 2); z2 = z2 / sum(z2);

Z2 = reshape(z2, x2res, x1res);

 

% Plot

fig = figure(2); clf;

subaxes(fig, 1, 2, 1, 0.15, 0.1); hold on;

pcolor(x1grid, x2grid, Z1);

plot(data1(:, 1), data1(:, 2), 'ro');

colormap gray; axis equal; axis([x1min x1max x2min x2max]);

title('Data1', 'FontSize', 15);

subaxes(fig, 1, 2, 2, 0.15, 0.1); hold on;

pcolor(x1grid, x2grid, Z2);

plot(data2(:, 1), data2(:, 2), 'bx');

colormap gray; axis equal; axis([x1min x1max x2min x2max]);

title('Data2', 'FontSize', 15);




3. Compute total variational distance


Total variational distance can be used to compute the distance between two distributions. 

Using the definition above, we can compute the variational distribution between to KDEs. 


tvd = 0.5*sum(abs(z1-z2));

fprintf('totalVariationalDistance is %.3f \n', tvd)



'Enginius > Machine Learning' 카테고리의 다른 글

A Gentle Start in Machine Learning  (0) 2017.06.25
강화 학습이란?  (2) 2017.05.24
Hierarchical Occupancy Flow  (0) 2015.12.21
Bayesian Optimization with Expected Improvement  (0) 2015.12.16
Convolutional Neural Network (CNN)  (5) 2015.12.03