Two Forms of Logistic Regression


There are two forms of Logistic Regression used in literature. In this post, I will build a bridge between these two forms and show they are equivalent.

Logistic Function & Logistic Regression

The common definition of Logistic Function is as follows:

    \[P(x) = \frac{1}{1+\exp(-x)} \;\; \qquad (1)\]

where x \in \mathbb{R} is the variable of the function and P(x) \in [0,1]. One important property of Equation (1) is that:

    \[ <span class="ql-right-eqno"> (1) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-26f49852a50c1f981273da59a3d7de67_l3.png" height="213" width="233" class="ql-img-displayed-equation " alt="\begin{eqnarray*}P(-x) &=& \frac{1}{1+\exp(x)} \nonumber \\&=& \frac{1}{1+\frac{1}{\exp(-x)}} \nonumber \\&=& \frac{\exp(-x)}{1+\exp(-x)} \nonumber \\&=& 1 - \frac{1}{1+\exp(-x)} \nonumber \\&=& 1 - P(x) \; \; \qquad (2)\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/> \]

The form of Equation (2) is widely used as the form of Logistic Regression (e.g., [1,2,3]):

    \[ <span class="ql-right-eqno"> (2) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-0d2ec58cfdefbc85b337c9195b7e9d47_l3.png" height="96" width="333" class="ql-img-displayed-equation " alt="\begin{eqnarray*}P(y = 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) &=& \frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \nonumber \\P(y = 0 \, | \, \boldsymbol{\beta}, \mathbf{x}) &=& \frac{1}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \;\; \qquad (3)\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/> \]

where \mathbf{x} is a feature vector and \boldsymbol{\beta} is a coefficient vector. By using Equation (2), we also have:

    \[ <span class="ql-right-eqno"> (3) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-ab78efd52d92fe6f7a6a48a6387e0bce_l3.png" height="19" width="276" class="ql-img-displayed-equation " alt="\begin{equation*}P(y=1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = 1 - P(y=0 \, | \, \boldsymbol{\beta}, \mathbf{x})\end{equation*}" title="Rendered by QuickLaTeX.com"/> \]

This formalism of Logistic Regression is used in [1,2] where labels y \in \{0,1\} and the functional form of the probability to generate different labels is different. Another formalism introduced in [3] unified the two forms into one single equation by integrating the label and the prediction together:

    \[ <span class="ql-right-eqno"> (4) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-adce25678a6ba44c60e8e18024431334_l3.png" height="43" width="347" class="ql-img-displayed-equation " alt="\begin{equation*}P(g= \pm 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + \exp( - g\boldsymbol{\beta}^{T} \mathbf{x})} \;\; \qquad (4)\end{equation*}" title="Rendered by QuickLaTeX.com"/> \]

where g \in \{\pm 1\} is the label for data item x. It is also easily to verify that P(g=1 \, | \, \boldsymbol{\beta}, \mathbf{x}) = 1 - P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x}).

The Equivalence of Two Forms of Logistic Regression

At first glance, the form (3) and the form (4) looks very different. However, the equivalence between these two forms can be easily established. Starting from the form (3), we can have:

    \[ <span class="ql-right-eqno"> (5) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-e86023543d98b569af5f56638c5993ff_l3.png" height="175" width="278" class="ql-img-displayed-equation " alt="\begin{eqnarray*}P(y = 1 \, | \, \boldsymbol{\beta}, \mathbf{x}) &=& \frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})} \nonumber \\&=& \frac{1}{\frac{1}{\exp(\boldsymbol{\beta}^{T} \mathbf{x})} + 1} \nonumber \\&=& \frac{1}{\exp(-\boldsymbol{\beta}^{T} \mathbf{x}) + 1} \nonumber \\&=& P(g= 1 \, | \, \boldsymbol{\beta}, \mathbf{x})\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/> \]

We can also establish the equivalence between P(y=0 \, | \, \boldsymbol{\beta}, \mathbf{x}) and P(g=-1 \, | \, \boldsymbol{\beta}, \mathbf{x}) easily by using property (2). Another way to establish the equivalence is from the classification rule. For the form (3), we have the following classification rule:

    \[ <span class="ql-right-eqno"> (6) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-bb3dda211fa7695381356592f2eac0f8_l3.png" height="117" width="225" class="ql-img-displayed-equation " alt="\begin{eqnarray*}\frac{\frac{\exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}}{\frac{1}{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}} & > & 1 \;\; \rightarrow \;\; y = 1 \nonumber \\\exp(\boldsymbol{\beta}^{T} \mathbf{x}) & > & 1 \nonumber \\\boldsymbol{\beta}^{T} \mathbf{x} & > & 0\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/> \]

An exactly same classification rule for the form (4) can also be obtained as:

    \[ <span class="ql-right-eqno"> (7) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-110b1c3ae6ac1d5eeddfaffa87acae4c_l3.png" height="166" width="264" class="ql-img-displayed-equation " alt="\begin{eqnarray*}\frac{\frac{1}{1 + \exp( - \boldsymbol{\beta}^{T} \mathbf{x})}}{\frac{1}{1 + \exp( \boldsymbol{\beta}^{T} \mathbf{x})}} & > & 1 \;\; \rightarrow \;\; g = 1 \nonumber \\\frac{1 + \exp(\boldsymbol{\beta}^{T} \mathbf{x})}{1 + \exp( - \boldsymbol{\beta}^{T} \mathbf{x})} & > & 1 \nonumber \\\exp(\boldsymbol{\beta}^{T} \mathbf{x}) & > & 1 \nonumber \\\boldsymbol{\beta}^{T} \mathbf{x} & > & 0\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/> \]

Therefore, we can see that two forms essentially learn the same classification boundary.

Logistic Loss

Since we establish the equivalence of two forms of Logistic Regression, it is convenient to use the second form as it can be explained by a general classification framework. Here, we assume y is the label of data and \mathbf{x} is a feature vector. The classification framework can be formalized as follows:

    \[ <span class="ql-right-eqno"> (8) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-7b2ccccdd2c250978dabe7bef0c428ee_l3.png" height="42" width="183" class="ql-img-displayed-equation " alt="\begin{equation*}\arg\min \sum_{i} L\Bigr(y_{i},f(\mathbf{x}_{i})\Bigl)\end{equation*}" title="Rendered by QuickLaTeX.com"/>\]

where f is a hypothesis function and L is loss function. For Logistic Regression, we have the following instantiation:

    \[ <span class="ql-right-eqno"> (9) </span><span class="ql-left-eqno">   </span><img src="https://www.hongliangjie.com/wp-content/ql-cache/quicklatex.com-38105dc931573fa3c94d601dccba3b8a_l3.png" height="59" width="296" class="ql-img-displayed-equation " alt="\begin{eqnarray*}f(\mathbf{x}) &=& \boldsymbol{\beta}^{T} \mathbf{x} \nonumber \\L\Bigr(y,f(\mathbf{x})\Bigl) &=& \log \Bigr( 1 + \exp(-y f(\mathbf{x})\Bigl)\end{eqnarray*}" title="Rendered by QuickLaTeX.com"/>\]

where y \in \{ \pm 1 \}.

References

[1] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.
[2] Tom M. Mitchell. Machine learning. McGraw Hill series in computer science. McGraw-Hill, 1997.
[3] Jason D. M. Rennie. Logistic Regression. http://people.csail.mit.edu/jrennie/writing, April 2003.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.