Total variational distance between two empirical distributions (KDE)

Enginius/Machine Learning

Total variational distance between two empirical distributions (KDE)

해리s 2016. 6. 6. 11:55

1. Computing empirical distribution

For simplicity, we will assume that the data lies in two dimensional space with x1 and x2 axes.

ndata1 = 100;

ndata2 = 200;

data1 = repmat([1 2], ndata1, 1) + randn(ndata1, 2);

data2 = repmat([2 1], ndata2, 1) + randn(ndata2, 2);

figure(1); hold on;

plot(data1(:, 1), data1(:, 2), 'ro');

plot(data2(:, 1), data2(:, 2), 'bx');

axis equal;

2. Kernel Density Estimation

To run KDE, you need a kernel function.

function [K, dK] = kernel_se(X1, X2, L1, L2, hyp)

% Normalize X (This is optional)

% X1 = normr(X1);

% X2 = normr(X2);

n1 = size(X1, 1);

n2 = size(X2, 1);

d1 = size(X1, 2);

d2 = size(X2, 2);

if d1 ~= d2, fpinrtf(2, 'Data dimension missmatch! \n'); end

d = d1;

beta = hyp(end); % Gain

gamma = hyp(1:d); % Inverse Length Parameters

% make sure 'gamma' stays positive

gamma = max(gamma, 1E-12*ones(size(gamma)));

% Compute Kernel Matrix

Gamma = diag(gamma);

temp1 = pdist2(X1, X2, 'mahalanobis', inv(Gamma)).^2;

K = beta*exp(-1/2*temp1) ...

.*cos(pi/2*pdist2(L1, L2, 'cityblock')) ;

% Limit condition number

if n1 == n2 && n1 > 1

sig = 1E-8*beta;

K = K + sig*eye(size(K));

end

% Compute Kernel Derivates

if nargin > 1

dK = zeros(n1, n2, d+1);

% w.r.t. gamma

for i = 1:d

A = -0.5*pdist2(X1(:, i), X2(:, i), 'euclidean').^2;

dK(:, :, i) = A.*K;

end

% w.r.t. gain (beta)

dK(:, :, d+1) = K/beta;

end

The code above is a leveraged kernel function that I will use. And, just ignore about the leverage part and the derivative part.

We will further assume that the space will be discretized mainly for computing the total variational distance (we can hardly compute the integral analytically, so instead we discretize the space and compute the sum of absolute differences which is basically a total variational distance)

To discretize the space, we will use meshgrid function to compute evenly spaced points in the space of our interest. Then we use the kernel function to compute KDE.

x1min = -2; x1max = 5;

x2min = -2; x2max = 5;

x1res = 50; x2res = 50;

x1grid = linspace(x1min, x1max, x1res);

x2grid = linspace(x2min, x2max, x2res);

[x1mesh, x2mesh] = meshgrid(x1grid, x2grid);

xpnts = [x1mesh(:) x2mesh(:)];

npnts = x1res*x2res;

invlen = 1;

K1 = kernel_se(xpnts, data1(:, 1:2) ...

, ones(npnts, 1), ones(ndata1, 1), [invlen invlen 1]);

z1 = sum(K1, 2); z1 = z1 / sum(z1);

Z1 = reshape(z1, x2res, x1res);

K2 = kernel_se(xpnts, data2(:, 1:2) ...

, ones(npnts, 1), ones(ndata2, 1), [invlen invlen 1]);

z2 = sum(K2, 2); z2 = z2 / sum(z2);

Z2 = reshape(z2, x2res, x1res);

% Plot

fig = figure(2); clf;

subaxes(fig, 1, 2, 1, 0.15, 0.1); hold on;

pcolor(x1grid, x2grid, Z1);

plot(data1(:, 1), data1(:, 2), 'ro');

colormap gray; axis equal; axis([x1min x1max x2min x2max]);

title('Data1', 'FontSize', 15);

subaxes(fig, 1, 2, 2, 0.15, 0.1); hold on;

pcolor(x1grid, x2grid, Z2);

plot(data2(:, 1), data2(:, 2), 'bx');

colormap gray; axis equal; axis([x1min x1max x2min x2max]);

title('Data2', 'FontSize', 15);

3. Compute total variational distance

Total variational distance can be used to compute the distance between two distributions.

Using the definition above, we can compute the variational distribution between to KDEs.

tvd = 0.5*sum(abs(z1-z2));

fprintf('totalVariationalDistance is %.3f \n', tvd)