Notes on “A comparison of logistic regression and naive Bayes”
Andrew Y. Ng and Michael I. Jordan had a classic paper on the comparison between logistic regression and naive Bayes. The main contribution of the paper is a theoretical analysis of how logistic regression and naive Bayes might perform and an experimental comparison to support this idea.
Several important points made by the paper. The first is that:
- The asymptotic error made by logistic regression is no more than that made by naive Bayes (Proposition 1 in the paper).
This conclusion provides a basis for what seems to be the widely held belief that discriminative classifiers are better than generative ones. The main conclusions of the paper is about the sample complexity of both classifiers. Sample complexity is the number of examples needed to approach the asymptotic error. For logistic regression, such sample complexity is:
\[
m = \Omega(n)
\]which means that the sample complexity is linear in \( n \) (Proposition 2 in the paper). For naive Bayes, we have:
\[
m = O(\log n)
\]which means that the sample complexity is logarithmic in \( n \) (Lemma 3 and Corollary 6 in the paper). All these conclusions imply that even though naive Bayes converges to a higher asymptotic error compared to logistic regression, it may also approach it significantly faster — after \( O(\log n) \), rather than \( O(n) \), training examples.
These ideas are discussed in [1]. Are these theoretical analysis always hold in practice? Not very reliable as far as [2] suggested. Note that these conclusions coming after strong assumptions. Nevertheless, they provide some general idea between logistic regression and naive Bayes.
References:
[1] On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Andrew Y. Ng and Michael Jordan. In NIPS 14, 2002. [ps, pdf]
[2] Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes”. Jing-Hao Xue and D. Michael Titterington. 2008. Neural Processing Letters 28, 3 (December 2008), 169-187. [pdf]