There are two forms of Logistic Regression used in literature. In this post, I will build a bridge between these two forms and show they are equivalent.
Logistic Function & Logistic Regression
The common definition of Logistic Function is as follows:
data:image/s3,"s3://crabby-images/19643/19643e0efe62ccc86650e70fea189035cf1dd799" alt="Rendered by QuickLaTeX.com x \in \mathbb{R}"
![Rendered by QuickLaTeX.com P(x) \in [0,1]](https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-88f94b8f82e0c5825bb11baec49d3c15_l3.png)
data:image/s3,"s3://crabby-images/d7bf4/d7bf49a16ba2cc92991e139bced11fa92b3e5d5b" alt="Rendered by QuickLaTeX.com \mathbf{x}"
data:image/s3,"s3://crabby-images/6ec86/6ec8638aa819ca1131e3299b5023418aa6bbe594" alt="Rendered by QuickLaTeX.com \boldsymbol{\beta}"
data:image/s3,"s3://crabby-images/656d2/656d27f703cee0da514b857a42997037a03a1655" alt="Rendered by QuickLaTeX.com y \in \{0,1\}"
data:image/s3,"s3://crabby-images/4a4ac/4a4aca21f7cd2f73f9e0df4ec14396c1475b5561" alt="Rendered by QuickLaTeX.com g \in \{\pm 1\}"
data:image/s3,"s3://crabby-images/dc992/dc9925f59d311eac2ce4d0ba090aadb12c3020af" alt="Rendered by QuickLaTeX.com x"
data:image/s3,"s3://crabby-images/1af01/1af01ab51ba3b3c7e578dfd1a627db3cd9a6e4f6" alt="Rendered by QuickLaTeX.com P(g=1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = 1 - P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x})"
The Equivalence of Two Forms of Logistic Regression
At first glance, the form (3) and the form (4) looks very different. However, the equivalence between these two forms can be easily established. Starting from the form (3), we can have:
data:image/s3,"s3://crabby-images/6434f/6434fcf1e7f57cea28d2da0b99d49fdf58e35a4c" alt="Rendered by QuickLaTeX.com P(y=0 \, | \, \boldsymbol{\beta}, \mathbf{x})"
data:image/s3,"s3://crabby-images/7471d/7471d04d113e10bcb463b02a273e69825af05b55" alt="Rendered by QuickLaTeX.com P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x})"
Logistic Loss
Since we establish the equivalence of two forms of Logistic Regression, it is convenient to use the second form as it can be explained by a general classification framework. Here, we assume is the label of data and
is a feature vector. The classification framework can be formalized as follows:
data:image/s3,"s3://crabby-images/c9da2/c9da21726bf105accadc7686b72b910cdbac8299" alt="Rendered by QuickLaTeX.com f"
data:image/s3,"s3://crabby-images/e6377/e6377f0819a85edf5eca3b78996f00d98be28f6a" alt="Rendered by QuickLaTeX.com L"
data:image/s3,"s3://crabby-images/b7d2c/b7d2cff5f1e96effb0e05e46ec62029e80f99c3f" alt="Rendered by QuickLaTeX.com y \in \{ \pm 1 \}"
References
[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.
[2] Tom M. Mitchell. Machine learning. McGraw Hill series in computer science. McGraw-Hill, 1997.
[3] Jason D. M. Rennie. Logistic Regression. http://people.csail.mit.edu/jrennie/writing, April 2003.