Update (2016-02-09): It has been brought to my attention that one of the decision trees is not being generated correctly so please don’t use these diagrams. I’m too lazy to debug this code though.

Decision trees are commonly used in machine learning because they work surprisingly well. A simple algorithm to make a decision tree is ID3. We choose the feature X that maximizes information gain IG(X), defined as:

1 \displaystyle{\begin{align}IG(X)\equiv H(Y) - H(Y|X)\end{align}}

where H is the entropy given by

2 \displaystyle{\begin{align}H(X) = \sum_i -p(X=i) \log p(X=i)\end{align}}

and the conditional entropy is given by

3 \displaystyle{\begin{align}H(Y|X) = \sum_i  p(X=i) H(Y|X=i)\end{align}}

