AMPL Modeling Tips #7: Logistic Regression with Exponential Cones

fdabrandao · July 27, 2023, 12:08pm

Given a sequence of training examples x_i \in \mathbf{R}^m, each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta \in \mathbf{R}^m which maximize the function:

\sum_{i:y_i=1} \log(S(\theta^Tx_i))+\sum_{i:y_i=0}\log(1-S(\theta^Tx_i))

where S is the logistic function S(x) = \frac{1}{1+e^{-x}} that estimates the probability of a binary classifier to be 0 or 1.

This function can be efficiently optimized using exponential cones with MOSEK!

Untitled Sigmoid

Check our interactive application on Streamlit

Check our notebook on Google Colab

1.1. Convex optimization model for logistic regression

Define the logistic function

S(x)=\frac{1}{1+e^{-x}}.

Next, given an observation x\in\mathbf{R}^d and weights \theta\in\mathbf{R}^d we set

h_\theta(x)=S(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}.

The weights vector \theta is part of the setup of the classifier. The expression

h_\theta(x) is interpreted as the probability that x belongs to class 1.

When asked to classify x the returned answer is

x\mapsto \begin{cases}\begin{array}{ll}1, & h_\theta(x)\geq 1/2, \\ 0, & h_\theta(x) < 1/2.\end{array}\end{cases}

When training a logistic regression algorithm we are given a sequence of training examples x_i,
each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta which
maximize the likelihood function \textstyle \prod_i h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{1-y_i}.

Of course every single y_i equals 0 or 1, so just one factor appears in the product for each training data point:

\max_\theta \textstyle \prod_{i:y_i=1} h_\theta(x_i) \prod_{i:y_i=0} (1-h_\theta(x_i)).

By taking logarithms we obtain a sum that is easier to optimize:

\max_\theta \sum_{i:y_i=1} \log(h_\theta(x_i))+\sum_{i:y_i=0}\log(1-h_\theta(x_i)).

Note that by negating we obtain the logarithmic loss function:

J(\theta) = -\sum_{i:y_i=1} \log(h_\theta(x_i))-\sum_{i:y_i=0}\log(1-h_\theta(x_i)).

The training problem with regularization (a standard technique to prevent overfitting) is now equivalent to

\min_\theta J(\theta) + \lambda\|\theta\|_2.

This formulation can be solved with a general nonlinear solver, such as IPOPT.

Convex optimization model for Logistic Regression

Model file “logistic_regression.mod”:

set POINTS;
set DIMS;                  # Dimensionality of x

param y{POINTS} binary;    # Points' classes
param x{POINTS, DIMS};
param lambda;              # Regularization parameter

var theta{DIMS};           # Regression parameter vector
var hTheta{i in POINTS}
    = 1 / (1 + exp( - sum{d in DIMS} theta[d]*x[i, d] ));

minimize LogisticReg:      # General nonlinear formulation
    - sum {i in POINTS: y[i] >0.5} log( hTheta[i] )
    - sum {i in POINTS: y[i]<=0.5} log( 1.0 - hTheta[i] )
    + lambda * sqrt( sum {d in DIMS} theta[d]^2 );

Using from Python with amplpy:

def logistic_regression(label, data, lambd, solver):
    from amplpy import AMPL

    # load the model
    ampl = AMPL()
    ampl.read("logistic_regression.mod")

    # load the data
    ampl.set["POINTS"] = data.index
    ampl.set["DIMS"] = data.columns
    ampl.param["x"] = data
    ampl.param["y"] = label
    ampl.param["lambda"] = lambd

    # solve
    ampl.option["solver"] = solver
    ampl.solve()
    assert ampl.get_value("solve_result") == "solved"

    # return solution
    return ampl.var["theta"].to_pandas()

1.2. Conic optimization model for logistic regression

For a conic solver such as MOSEK, we need to reformulate the problem.

The objective function can equivalently be phrased as

\begin{split}\begin{array}{lrllr} \min & \sum_i t_i +\lambda r \\ \text{s.t.} & t_i & \geq - \log(h_\theta(x)) & = \log(1+e^{-\theta^Tx_i}) & \mathrm{if}\ y_i=1, \\ & t_i & \geq - \log(1-h_\theta(x)) & = \log(1+e^{\theta^Tx_i}) & \mathrm{if}\ y_i=0, \\ & r & \geq \|\theta\|_2. \end{array}\end{split}

The key point is to implement the softplus bound t\geq \log(1+e^u), which is the simplest example of a log-sum-exp constraint for two terms. Here t is a scalar variable and u will be the affine expression of the form \pm \theta^Tx_i. This is equivalent to

e^{u-t} + e^{-t} \leq 1

and further to

\begin{split}\begin{array}{rclr} z_1 & \ge & e^{u-t}, \\ z_2 & \ge & e^{-t}, \\ z_1+z_2 & \leq & 1. \end{array}\end{split}

Conic optimization model for Logistic Regression

Model file “logistic_regression_conic.mod”:

set POINTS;
set DIMS;                  # Dimensionality of x

param y{POINTS} binary;    # Points' classes
param x{POINTS, DIMS};
param lambda;              # Regularization parameter

var theta{DIMS};           # Regression parameter vector
var t{POINTS};
var u{POINTS};
var v{POINTS};
var r >= 0;

minimize LogisticRegConic:
    sum {i in POINTS} t[i] + lambda * r;

s.t. Softplus1{i in POINTS}:  # reformulation of softplus
    u[i] + v[i] <= 1;
s.t. Softplus2{i in POINTS}:
    u[i] >= exp(
        (if y[i]>0.5 then -1 else 1)
        * (sum {d in DIMS} theta[d] * x[i, d])
        - t[i]
    );
s.t. Softplus3{i in POINTS}:
    v[i] >= exp(-t[i]);

s.t. NormTheta:              # Quadratic cone for regularizer
    r^2 >= sum {d in DIMS} theta[d]^2;