In this post, I would like to review some probabilistic models for user profiling. More specifically, I’m looking at the models that taking users’ preferences into account and try to predict certain quantities based on these preferences, which is a normal scenario for collaborative filtering.
- “Latent semantic models for collaborative filtering” by Thomas Hofmann, ACM Transactions on Information Systems, 2004
The proposed model is based on pLSA. In order to incorporate ratings, the authors propose the following decomposition scheme:
\( p(v|u,y) = \sum_{z}p(z|u)p(v|z,y) \) where \( p(v|z,y) \) follows Gaussian distribution. The paper also introduced practical techniques to normalize user ratings. The model is learned through (tempered) EM. - “Modeling User Rating Profiles For Collaborative Filtering” by Benjamin Marlin, NIPS 2003
The model proposed in the paper is essentially LDA in the context of collaborative filtering. The original document-term matrix was replaced by a user-item (user-rating) matrix. Unlike this pLSA model for collaborative filtering, this model introduced the decomposition scheme as:\( p(r|u)=\sum_{z} p(r|z)p(z|u) \)where no “dummy” variable \( y \) gets involved. The model is learned through variational inference. - “Flexible Mixture Model for Collaborative Filtering” by Luo Si and Rong Jin, ICML 2003
The model proposed in the paper is an extension of two-side clustering model of pLSA. It assumes that users are belong to multiple clusters and items are also belong to multiple clusters. The rating of a particular item is based on the user clusters and item clusters. Therefore, \( p(x,y,r) = \sum_{z_{x}} \sum_{z_{y}} p(z_{x})p(y_{x})p(x|z_{x})p(y|z_{y})p(r|z_{x},z_{y}) \) where \( z_{x} \) are latent factors for users and \( z_{y}\) are latent factors for items. All distributions here are multinomial distributions. The model is learned through EM.
A full Bayesian treatment of the model is introduced in “Latent Grouping Models for User Preference Prediction” by Eerika Savia, Kai Puolamaki and Samuel Kaski in Machine Learning 2009, which is learned through Gibbs Sampling. - “The Multiple Multiplicative Factor Model For Collaborative Filtering” by Benjamin Marlin and Richard S. Zemel, ICML 2004
Rather than using a same set of latent factors to “explain” all ratings, the Multiple Multiplicative Factor Model (MMF) tries to use different latent factors to “explain” different ratings. Therefore, for each user, the model has a \( K\)-dimensional binary vector \( Z \) where each element \( z_{k} \) represents whether \( k \)-th factor is “active” or not. For rating \( X \), the authors introduced to use softmax function to map arbitrary real values to simplex. The model is learned through variational inference. - “Efficient Bayesian Hierarchical User Modeling For Recommendation Systems” by Yi Zhang and Jonanthan Koren, SIGIR 2007
The model introduced in this paper is similar to the pLSA version of user profiles. The rating \( r \) of a item \( y \) by user \( u \) is decomposed as:\( p(r|y,u)=p(u)p(r|u,y) \) where \( p(u) \) is a Gaussian distribution and \( p(r|u,y) \) is modeled through a Generalized Linear Model \( r=u^{T}y+\epsilon \), essentially another Gaussian distribution in the paper. The novel part of the model is that \( y \) is the document representation of an item. Therefore, the authors assume that the rating is weighted sum of terms of documents where the weights are user specific. The model is learned through EM.
A study about several variants of pLSA on collaborative filtering is done by Rong Jin, Luo Si and Chengxiang Zhai.
- “A Study of Mixture Models for Collaborative Filtering” R. Jin, L. Si and C. X. Zhai, Information Retrieval Journal, 9(3), pp. 357-382, 2006
A study about how to normalize ratings is done by Rong Jin and Luo Si:
- “A Study of Methods for Normalizing User Ratings in Collaborative Filtering” R. Jin and L. Si, The 27th Annual International ACM SIGIR Conference (SIGIR 2004), pp. 568-569, 2004