AMPL Modeling Tips #7: Logistic Regression with Exponential Cones

Given a sequence of training examples x_i \in \mathbf{R}^m, each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta \in \mathbf{R}^m which maximize the function:

\sum_{i:y_i=1} \log(S(\theta^Tx_i))+\sum_{i:y_i=0}\log(1-S(\theta^Tx_i))

where S is the logistic function S(x) = \frac{1}{1+e^{-x}} that estimates the probability of a binary classifier to be 0 or 1.

This function can be efficiently optimized using exponential cones with MOSEK!

Untitled Sigmoid

:point_down: Check our interactive application on Streamlit
Streamlit App

:point_down: Check our notebook on Google Colab
Open in Colab

1.1. Convex optimization model for logistic regression

Define the logistic function

S(x)=\frac{1}{1+e^{-x}}.

Next, given an observation x\in\mathbf{R}^d and weights \theta\in\mathbf{R}^d we set

h_\theta(x)=S(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}.

The weights vector \theta is part of the setup of the classifier. The expression

h_\theta(x) is interpreted as the probability that x belongs to class 1.

When asked to classify x the returned answer is

x\mapsto \begin{cases}\begin{array}{ll}1, & h_\theta(x)\geq 1/2, \\ 0, & h_\theta(x) < 1/2.\end{array}\end{cases}

When training a logistic regression algorithm we are given a sequence of training examples x_i,
each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta which
maximize the likelihood function \textstyle \prod_i h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{1-y_i}.

Of course every single y_i equals 0 or 1, so just one factor appears in the product for each training data point:

\max_\theta \textstyle \prod_{i:y_i=1} h_\theta(x_i) \prod_{i:y_i=0} (1-h_\theta(x_i)).

By taking logarithms we obtain a sum that is easier to optimize:

\max_\theta \sum_{i:y_i=1} \log(h_\theta(x_i))+\sum_{i:y_i=0}\log(1-h_\theta(x_i)).

Note that by negating we obtain the logarithmic loss function:

J(\theta) = -\sum_{i:y_i=1} \log(h_\theta(x_i))-\sum_{i:y_i=0}\log(1-h_\theta(x_i)).

The training problem with regularization (a standard technique to prevent overfitting) is now equivalent to

\min_\theta J(\theta) + \lambda\|\theta\|_2.

This formulation can be solved with a general nonlinear solver, such as IPOPT.

Convex optimization model for Logistic Regression

Model file “logistic_regression.mod”:

set POINTS;
set DIMS;                  # Dimensionality of x

param y{POINTS} binary;    # Points' classes
param x{POINTS, DIMS};
param lambda;              # Regularization parameter

var theta{DIMS};           # Regression parameter vector
var hTheta{i in POINTS}
    = 1 / (1 + exp( - sum{d in DIMS} theta[d]*x[i, d] ));

minimize LogisticReg:      # General nonlinear formulation
    - sum {i in POINTS: y[i] >0.5} log( hTheta[i] )
    - sum {i in POINTS: y[i]<=0.5} log( 1.0 - hTheta[i] )
    + lambda * sqrt( sum {d in DIMS} theta[d]^2 );

Using from Python with amplpy:

def logistic_regression(label, data, lambd, solver):
    from amplpy import AMPL

    # load the model
    ampl = AMPL()
    ampl.read("logistic_regression.mod")

    # load the data
    ampl.set["POINTS"] = data.index
    ampl.set["DIMS"] = data.columns
    ampl.param["x"] = data
    ampl.param["y"] = label
    ampl.param["lambda"] = lambd

    # solve
    ampl.option["solver"] = solver
    ampl.solve()
    assert ampl.get_value("solve_result") == "solved"

    # return solution
    return ampl.var["theta"].to_pandas()

1.2. Conic optimization model for logistic regression

For a conic solver such as MOSEK, we need to reformulate the problem.

The objective function can equivalently be phrased as

\begin{split}\begin{array}{lrllr} \min & \sum_i t_i +\lambda r \\ \text{s.t.} & t_i & \geq - \log(h_\theta(x)) & = \log(1+e^{-\theta^Tx_i}) & \mathrm{if}\ y_i=1, \\ & t_i & \geq - \log(1-h_\theta(x)) & = \log(1+e^{\theta^Tx_i}) & \mathrm{if}\ y_i=0, \\ & r & \geq \|\theta\|_2. \end{array}\end{split}

The key point is to implement the softplus bound t\geq \log(1+e^u), which is the simplest example of a log-sum-exp constraint for two terms. Here t is a scalar variable and u will be the affine expression of the form \pm \theta^Tx_i. This is equivalent to

e^{u-t} + e^{-t} \leq 1

and further to

\begin{split}\begin{array}{rclr} z_1 & \ge & e^{u-t}, \\ z_2 & \ge & e^{-t}, \\ z_1+z_2 & \leq & 1. \end{array}\end{split}

Conic optimization model for Logistic Regression

Model file “logistic_regression_conic.mod”:

set POINTS;
set DIMS;                  # Dimensionality of x

param y{POINTS} binary;    # Points' classes
param x{POINTS, DIMS};
param lambda;              # Regularization parameter

var theta{DIMS};           # Regression parameter vector
var t{POINTS};
var u{POINTS};
var v{POINTS};
var r >= 0;

minimize LogisticRegConic:
    sum {i in POINTS} t[i] + lambda * r;

s.t. Softplus1{i in POINTS}:  # reformulation of softplus
    u[i] + v[i] <= 1;
s.t. Softplus2{i in POINTS}:
    u[i] >= exp(
        (if y[i]>0.5 then -1 else 1)
        * (sum {d in DIMS} theta[d] * x[i, d])
        - t[i]
    );
s.t. Softplus3{i in POINTS}:
    v[i] >= exp(-t[i]);

s.t. NormTheta:              # Quadratic cone for regularizer
    r^2 >= sum {d in DIMS} theta[d]^2;

Using from Python with amplpy:

def logistic_regression_conic(label, data, lambd, solver):
    from amplpy import AMPL

    # load the model
    ampl = AMPL()
    ampl.read("logistic_regression_conic.mod")

    # load the data
    ampl.set["POINTS"] = data.index
    ampl.set["DIMS"] = data.columns
    ampl.param["x"] = data
    ampl.param["y"] = label
    ampl.param["lambda"] = lambd

    # solve
    ampl.option["solver"] = solver
    ampl.solve()
    assert ampl.get_value("solve_result") == "solved"

    # return solution
    return ampl.var["theta"].to_pandas()

[On Google Colab] [On Streamlit] [On LinkedIn] [On Twitter]

1 Like