Given a sequence of training examples x_i \in \mathbf{R}^m, each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta \in \mathbf{R}^m which maximize the function:
where S is the logistic function S(x) = \frac{1}{1+e^{-x}} that estimates the probability of a binary classifier to be 0 or 1.
This function can be efficiently optimized using exponential cones with MOSEK!
Check our interactive application on Streamlit
Check our notebook on Google Colab
1.1. Convex optimization model for logistic regression
Define the logistic function
Next, given an observation x\in\mathbf{R}^d and weights \theta\in\mathbf{R}^d we set
The weights vector \theta is part of the setup of the classifier. The expression
h_\theta(x) is interpreted as the probability that x belongs to class 1.
When asked to classify x the returned answer is
When training a logistic regression algorithm we are given a sequence of training examples x_i,
each labelled with its class y_i\in \{0,1\} and we seek to find the weights \theta which
maximize the likelihood function \textstyle \prod_i h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{1-y_i}.
Of course every single y_i equals 0 or 1, so just one factor appears in the product for each training data point:
By taking logarithms we obtain a sum that is easier to optimize:
Note that by negating we obtain the logarithmic loss function:
The training problem with regularization (a standard technique to prevent overfitting) is now equivalent to
This formulation can be solved with a general nonlinear solver, such as IPOPT.
Convex optimization model for Logistic Regression
Model file “logistic_regression.mod”:
set POINTS;
set DIMS; # Dimensionality of x
param y{POINTS} binary; # Points' classes
param x{POINTS, DIMS};
param lambda; # Regularization parameter
var theta{DIMS}; # Regression parameter vector
var hTheta{i in POINTS}
= 1 / (1 + exp( - sum{d in DIMS} theta[d]*x[i, d] ));
minimize LogisticReg: # General nonlinear formulation
- sum {i in POINTS: y[i] >0.5} log( hTheta[i] )
- sum {i in POINTS: y[i]<=0.5} log( 1.0 - hTheta[i] )
+ lambda * sqrt( sum {d in DIMS} theta[d]^2 );
Using from Python with amplpy:
def logistic_regression(label, data, lambd, solver):
from amplpy import AMPL
# load the model
ampl = AMPL()
ampl.read("logistic_regression.mod")
# load the data
ampl.set["POINTS"] = data.index
ampl.set["DIMS"] = data.columns
ampl.param["x"] = data
ampl.param["y"] = label
ampl.param["lambda"] = lambd
# solve
ampl.option["solver"] = solver
ampl.solve()
assert ampl.get_value("solve_result") == "solved"
# return solution
return ampl.var["theta"].to_pandas()
1.2. Conic optimization model for logistic regression
For a conic solver such as MOSEK, we need to reformulate the problem.
The objective function can equivalently be phrased as
The key point is to implement the softplus bound t\geq \log(1+e^u), which is the simplest example of a log-sum-exp constraint for two terms. Here t is a scalar variable and u will be the affine expression of the form \pm \theta^Tx_i. This is equivalent to
and further to
Conic optimization model for Logistic Regression
Model file “logistic_regression_conic.mod”:
set POINTS;
set DIMS; # Dimensionality of x
param y{POINTS} binary; # Points' classes
param x{POINTS, DIMS};
param lambda; # Regularization parameter
var theta{DIMS}; # Regression parameter vector
var t{POINTS};
var u{POINTS};
var v{POINTS};
var r >= 0;
minimize LogisticRegConic:
sum {i in POINTS} t[i] + lambda * r;
s.t. Softplus1{i in POINTS}: # reformulation of softplus
u[i] + v[i] <= 1;
s.t. Softplus2{i in POINTS}:
u[i] >= exp(
(if y[i]>0.5 then -1 else 1)
* (sum {d in DIMS} theta[d] * x[i, d])
- t[i]
);
s.t. Softplus3{i in POINTS}:
v[i] >= exp(-t[i]);
s.t. NormTheta: # Quadratic cone for regularizer
r^2 >= sum {d in DIMS} theta[d]^2;
Using from Python with amplpy:
def logistic_regression_conic(label, data, lambd, solver):
from amplpy import AMPL
# load the model
ampl = AMPL()
ampl.read("logistic_regression_conic.mod")
# load the data
ampl.set["POINTS"] = data.index
ampl.set["DIMS"] = data.columns
ampl.param["x"] = data
ampl.param["y"] = label
ampl.param["lambda"] = lambd
# solve
ampl.option["solver"] = solver
ampl.solve()
assert ampl.get_value("solve_result") == "solved"
# return solution
return ampl.var["theta"].to_pandas()