The “god-father” of LDA, David Blei, recently published a tutorial on Bayesian Non-parametric Models, with one of his student. The whole tutorial is easy-reading and provides very clear overview of Bayesian Non-parametric Models. In particular, Chinese Restaurant Process (CRP) and Indian Buffet Process are discussed in a very intuitive way. For those who are interests in technical details about these models, this tutorial may be just a starting point and the Appendix points out several ways to discuss models more formally, including inference algorithms.
One specific interesting property shown in this tutorial is the “exchangeable” property for CRP, which I wish to re-state as below.
Let
be the table assignment of the
th customer. A draw from CPR can be generated by sequentially assigning observations to classes with probability:
![Rendered by QuickLaTeX.com \[P(c_{n} = k | \mathbf{c}_{1:n-1}) = \begin{cases}\frac{m_{k}}{n-1+\alpha}, & \mbox{if } k \leq \mathbf{K}_{+} \mbox{ (i.e., $k$ is a previously occupied table)} \\\frac{\alpha}{n-1+\alpha}, & \mbox{otherwise (i.e., $k$ is the next unoccupied table)}\end{cases}\]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-da195e11008721383f496cbe4a35f65f_l3.png)
where
is the number of customers sitting at table
, and
is the number of tables for which
. The parameter
is called the concentration parameter. The CRP exhibits an important invariance property: The cluster assignments under this distribution are exchangeable. This means
is unchanged if the order of customers is shuffled.
Consider the joint distribution of a set of customer assignments
. It decomposes according to the chain rule:
![Rendered by QuickLaTeX.com \[p(c_{1}, c_{2}, \cdots , c_{N}) = p(c_{1}) p(c_{2} | c_{1}) \cdots p(c_{N} | c_{1}, c_{2}, \cdots , c_{N-1}) \]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-48b5a257bd5accf510f15fb699a66c41_l3.png)
where each terms comes from above equation. To show that this distribution is exchangeable, we will introduce some new notation. Let
be the number of groups in which these assignments place the customers, which is a number between 1 and
. Let
be the set of indices of customers assigned to the
th group, and let
be the number of customers assigned to that group. Now, for a particular group
the joint probability of all assignments in this group is:
![Rendered by QuickLaTeX.com \[ \frac{\alpha}{I_{k,1}-1+\alpha} \frac{1}{I_{k,2}-1+\alpha} \frac{2}{I_{k,3}-1+\alpha} \cdots \frac{N_{k}-1}{I_{k,N}-1+\alpha} \]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-7a000f5ecf58956821a5b6625ebf5a8f_l3.png)
where each term in the equation represents a customer. The numerator can be re-written as
. Therefore, we have:
![Rendered by QuickLaTeX.com \[ p(c_{1}, c_{2}, \cdots , c_{N}) = \prod_{k=1}^{K} \frac{\alpha (N_{k}-1)!}{(I_{k,1}-1+\alpha)(I_{k,2}-1+\alpha)\cdots (I_{k,N_{k}}-1+\alpha)} \]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-442b2b90242f1dda0a7bb7fd918745c7_l3.png)
Finally, notice that the union of
across all groups
identifies each index once, because each customer is assigned to exactly one group. This simplifies the denominator and let us write the joint as:
![Rendered by QuickLaTeX.com \[ p(c_{1}, c_{2}, \cdots , c_{N}) = \frac{\alpha^{K} \prod_{k=1}^{K} (N_{k}-1)! }{\prod_{i=1}^{N} (i-1+\alpha)} \]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-d08ffdaf6fd4cf68471e63d2dba97462_l3.png)
This equation only depends on the number of groups
and the size of each group
.