Information Theoretic Machine Learning

Following is a very minimal introduction to the mathematical (statistical) ideas that show up very often in information theoretic machine learning and deep learning papers.

Important/Common Concepts and Equations

Shannon Entropy

$$ H(X) = -\sum_x{p(x) \log_2{p(x)}} $$

Mutual Information

$$ I(X;Y) = \sum_{x,y}{p(x,y) \log_2{\frac{p(x,y)}{p(x)p(y)}}} $$

$$ I(X;Y) = H(X) - H(X | Y) $$

Kullback-Leibler (KL) Divergence

$$ D_{KL}(P||Q) = \sum_x{P(x) \log_2{\frac{P(x)}{Q(x)}}} $$

Multivariate Gaussian

$$ p(x) = \frac{1}{\sqrt{(2\pi)^k |\sum|}} \exp(- \frac{1}{2} (x - \mu)^T \sum^{-1}(x - \mu)) $$

Expectation (E)

$$ \mathbb{E}_{P(X)}[f(X)] = \int P(x)f(x) dx $$

Markov Chain

A Markov chain is simply a set of discrete random processes that happen one after another in which each next process is only dependent on the current process. This can be viewed from the perspective of a neural network in which each layers outputs only depends on the previous layers outputs.

$$ X \rightarrow Z \rightarrow Y $$

Data Processing Inequality (DPI)

The data processing inequality simply states that for any Markov chain, the mutual information between two stochastic processes in said chain can only decrease.

$$ \text{For Markov chain } X \rightarrow Z \rightarrow Y \text{ : } I(X;Z) \geq I(X;Y) $$

Reparameterization Invariance [1]

For two invertable functions ϕ, ψ, the mutual information still holds:

$$ I(X;Y) = I(\phi(X);\psi(Y)). $$

For deep neural networks, this simply means that shuffling the weights of a given layer does not change the mutual information between that layer and all the others. This is important when considering computational complexity as done in the V-Information paper because the mutual information between the two random variables does not change, but the computational complexity can increase heavily depending on how hard the functions ϕ, ψ are to invert. The paper introduces a new type of information that takes computational constraints into consideration in this case.

../