Total variational distance between two empirical distributions (KDE)
1. Computing empirical distribution
For simplicity, we will assume that the data lies in two dimensional space with x1 and x2 axes.
ndata1 = 100;
ndata2 = 200;
data1 = repmat([1 2], ndata1, 1) + randn(ndata1, 2);
data2 = repmat([2 1], ndata2, 1) + randn(ndata2, 2);
figure(1); hold on;
plot(data1(:, 1), data1(:, 2), 'ro');
plot(data2(:, 1), data2(:, 2), 'bx');
axis equal;
2. Kernel Density Estimation
To run KDE, you need a kernel function.
function [K, dK] = kernel_se(X1, X2, L1, L2, hyp)
% Normalize X (This is optional)
% X1 = normr(X1);
% X2 = normr(X2);
n1 = size(X1, 1);
n2 = size(X2, 1);
d1 = size(X1, 2);
d2 = size(X2, 2);
if d1 ~= d2, fpinrtf(2, 'Data dimension missmatch! \n'); end
d = d1;
beta = hyp(end); % Gain
gamma = hyp(1:d); % Inverse Length Parameters
% make sure 'gamma' stays positive
gamma = max(gamma, 1E-12*ones(size(gamma)));
% Compute Kernel Matrix
Gamma = diag(gamma);
temp1 = pdist2(X1, X2, 'mahalanobis', inv(Gamma)).^2;
K = beta*exp(-1/2*temp1) ...
.*cos(pi/2*pdist2(L1, L2, 'cityblock')) ;
% Limit condition number
if n1 == n2 && n1 > 1
sig = 1E-8*beta;
K = K + sig*eye(size(K));
end
% Compute Kernel Derivates
if nargin > 1
dK = zeros(n1, n2, d+1);
% w.r.t. gamma
for i = 1:d
A = -0.5*pdist2(X1(:, i), X2(:, i), 'euclidean').^2;
dK(:, :, i) = A.*K;
end
% w.r.t. gain (beta)
dK(:, :, d+1) = K/beta;
end
The code above is a leveraged kernel function that I will use. And, just ignore about the leverage part and the derivative part.
We will further assume that the space will be discretized mainly for computing the total variational distance (we can hardly compute the integral analytically, so instead we discretize the space and compute the sum of absolute differences which is basically a total variational distance)
To discretize the space, we will use meshgrid function to compute evenly spaced points in the space of our interest. Then we use the kernel function to compute KDE.
x1min = -2; x1max = 5;
x2min = -2; x2max = 5;
x1res = 50; x2res = 50;
x1grid = linspace(x1min, x1max, x1res);
x2grid = linspace(x2min, x2max, x2res);
[x1mesh, x2mesh] = meshgrid(x1grid, x2grid);
xpnts = [x1mesh(:) x2mesh(:)];
npnts = x1res*x2res;
invlen = 1;
K1 = kernel_se(xpnts, data1(:, 1:2) ...
, ones(npnts, 1), ones(ndata1, 1), [invlen invlen 1]);
z1 = sum(K1, 2); z1 = z1 / sum(z1);
Z1 = reshape(z1, x2res, x1res);
K2 = kernel_se(xpnts, data2(:, 1:2) ...
, ones(npnts, 1), ones(ndata2, 1), [invlen invlen 1]);
z2 = sum(K2, 2); z2 = z2 / sum(z2);
Z2 = reshape(z2, x2res, x1res);
% Plot
fig = figure(2); clf;
subaxes(fig, 1, 2, 1, 0.15, 0.1); hold on;
pcolor(x1grid, x2grid, Z1);
plot(data1(:, 1), data1(:, 2), 'ro');
colormap gray; axis equal; axis([x1min x1max x2min x2max]);
title('Data1', 'FontSize', 15);
subaxes(fig, 1, 2, 2, 0.15, 0.1); hold on;
pcolor(x1grid, x2grid, Z2);
plot(data2(:, 1), data2(:, 2), 'bx');
colormap gray; axis equal; axis([x1min x1max x2min x2max]);
title('Data2', 'FontSize', 15);
3. Compute total variational distance
Total variational distance can be used to compute the distance between two distributions.
Using the definition above, we can compute the variational distribution between to KDEs.
tvd = 0.5*sum(abs(z1-z2));
fprintf('totalVariationalDistance is %.3f \n', tvd)