Daily Archives: August 27, 2011


An Easy Reading Tutorial for Bayesian Non-parametric Models

The “god-father” of LDA, David Blei, recently published a tutorial on Bayesian Non-parametric Models, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and Indian Buffet Process are discussed in a very intuitive way. For those who are interests in technical details about these models, this tutorial may be just a starting point and the Appendix points out several ways to discuss models more formally, including inference algorithms.

One specific interesting property shown in this tutorial is the “exchangeable” property for CRP, which I wish to re-state as below.

Let c_{n} be the table assignment of the nth customer. A draw from CPR can be generated by sequentially assigning observations to classes with probability:

    \[P(c_{n} = k | \mathbf{c}_{1:n-1}) = \begin{cases}\frac{m_{k}}{n-1+\alpha}, & \mbox{if } k \leq \mathbf{K}_{+} \mbox{ (i.e., $k$ is a previously occupied table)} \\\frac{\alpha}{n-1+\alpha}, & \mbox{otherwise (i.e., $k$ is the next unoccupied table)}\end{cases}\]

where m_{k} is the number of customers sitting at table k, and \mathbf{K}_{+} is the number of tables for which m_{k} > 0. The parameter \alpha is called the concentration parameter. The CRP exhibits an important invariance property: The cluster assignments under this distribution are exchangeable. This means p(\mathbf{c}) is unchanged if the order of customers is shuffled.

Consider the joint distribution of a set of customer assignments c_{1:N}. It decomposes according to the chain rule:

    \[p(c_{1}, c_{2}, \cdots , c_{N}) = p(c_{1}) p(c_{2} | c_{1}) \cdots p(c_{N} | c_{1}, c_{2}, \cdots , c_{N-1}) \]

where each terms comes from above equation. To show that this distribution is exchangeable, we will introduce some new notation. Let \mathbf{K}(c_{1:N}) be the number of groups in which these assignments place the customers, which is a number between 1 and N. Let I_{k} be the set of indices of customers assigned to the kth group, and let N_{k} be the number of customers assigned to that group. Now, for a particular group k the joint probability of all assignments in this group is:

    \[ \frac{\alpha}{I_{k,1}-1+\alpha} \frac{1}{I_{k,2}-1+\alpha} \frac{2}{I_{k,3}-1+\alpha} \cdots \frac{N_{k}-1}{I_{k,N}-1+\alpha} \]

where each term in the equation represents a customer. The numerator can be re-written as \alpha (N_{k}-1)!. Therefore, we have:

    \[ p(c_{1}, c_{2}, \cdots , c_{N}) = \prod_{k=1}^{K} \frac{\alpha (N_{k}-1)!}{(I_{k,1}-1+\alpha)(I_{k,2}-1+\alpha)\cdots (I_{k,N_{k}}-1+\alpha)} \]

Finally, notice that the union of \mathbf{I}_{k} across all groups k identifies each index once, because each customer is assigned to exactly one group. This simplifies the denominator and let us write the joint as:

    \[ p(c_{1}, c_{2}, \cdots , c_{N}) = \frac{\alpha^{K} \prod_{k=1}^{K} (N_{k}-1)! }{\prod_{i=1}^{N} (i-1+\alpha)} \]

This equation only depends on the number of groups \mathbf{K} and the size of each group \mathbf{N}_{k}.