I’m running a modeling system written in Python that throws the entire huge problem (MIP model with >400,000 rows and columns) into a solver. I just moved operations from an old dual Xeon (8 cores total) machine to a machine with an AMD Ryzen 9 5950X. OS is Gentoo Linux. Without changing anything about the HiGHS or mp binaries, the move alone brought the model run time from just over three days to just under fourteen hours but I’m looking for ways to speed things up even more.
I’ve figured out how to hack CMakeLists.txt to get it to add -march=native and -flto (link time optimization) to both the HiGHS and the mp compile and link commands (clang++ is the compiler for HiGHS; GNU gcc c++ for mp). Using -march=native shaved about 20 minutes off the model run time. I’ve tried -flto on a much-abbreviated version of the dataset and that cut the run time by 1%; if that proportion holds then it’ll save only 7m24s off the run time.
Here is how clang++ is getting invoked for HiGHS currently:
/usr/lib/llvm/15/bin/clang++ -I/home/me/HiGHS/build -I/home/me/HiGHS/app -I/home/me/HiGHS/extern -I/home/me/HiGHS/extern/zstr -I/home/me/HiGHS/src -I/ho
me/me/HiGHS/src/io -I/home/me/HiGHS/src/ipm/ipx -I/home/me/HiGHS/src/ipm/basiclu -I/home/me/HiGHS/src/lp_data -I/home/me/HiGHS/src/mip -I/home/me/HiGHS/s
rc/model -I/home/me/HiGHS/src/presolve -I/home/me/HiGHS/src/qpsolver -I/home/me/HiGHS/src/simplex -I/home/me/HiGHS/src/test -I/home/me/HiGHS/src/util -I/
home/me/HiGHS/src/extern/catch -Wall -Wextra -Wno-unused-parameter -pedantic -mpopcnt -march=native -flto -O3 -DNDEBUG -Wno-unused-variable -Wno-unused-c
onst-variable -std=gnu++11 -MD -MT check/CMakeFiles/unit_tests.dir/TestSort.cpp.o -MF CMakeFiles/unit_tests.dir/TestSort.cpp.o.d -o CMakeFiles/unit_tests
.dir/TestSort.cpp.o -c /home/me/HiGHS/check/TestSort.cpp
And for mp it’s like this:
/usr/bin/ccache /usr/bin/c++ -DMP_DATE=20221228 -DMP_SYSINFO="Linux x86_64" -DMP_USE_ATOMIC -DMP_USE_HASH -DMP_USE_UNIQUE_PTR -I/home/me/mp/include -I/ho
me/me/mp/solvers -I/home/me/mp/src -isystem /usr/local/include/highs -Wall -Wextra -pedantic -march=native -O3 -flto -O3 -DNDEBUG -fPIC -std=c++11 -MD -M
T solvers/CMakeFiles/amplhighsmp-static.dir/highsmp/highsmpcommon.cc.o -MF CMakeFiles/amplhighsmp-static.dir/highsmp/highsmpcommon.cc.o.d -o CMakeFiles/a
mplhighsmp-static.dir/highsmp/highsmpcommon.cc.o -c /home/me/mp/solvers/highsmp/highsmpcommon.cc
In addition to that, I’ve tried adding “alg:parallel=on” to the options I’m already using (previously, “mip_rel_gap=0.005 tech:outlev=1 tech:timing=1”) that the modeling system is sending to highsmp and once the solver engages it is acknowledging that option in the solver output. However, as I watch the machine it’s not parallelizing to any significant extent; I briefly see it engaging a second and a third core near the start of the run but almost all of the time, best I can tell, it’s only using one. On a 16-core/32-thread machine I’d love to be able to take advantage of parallel processing if at all possible.
Any suggestions on how I might speed this up? At this stage I’m only looking for huge performance jumps; I don’t want to burn many more hours getting a percent here or there.
UPDATE: Adding “mip:mincliquetable=1” seems to light up the whole box occasionally. I’m letting this one continue to run with the full dataset to see what kind of run time I get.
For a problem of that size most open-source solvers will struggle. You can achieve some performance improvements by trying differ solver options but in most cases the gains will still be small.
In order a solve a MIP model of that size in a substantially smaller amount of time you will likely need a commercial solver. It is not easy to parallelize the solve process and commercial solvers have more algorithms that they can run in parallel to take advantage of additional threads. However, even commercial solvers stop benefiting much of additional threads at around 25 threads.
In order to use commercial solvers for free, you can use the kestrel driver to solve your model on the NEOS Server. You can use for instance the commercial solver CPLEX and see how it performs.
Highs seems to not have parallel MIP. Even exporting an instance as MPS and running with standalone highs executable with --parallel on and threads=8, it only uses 1 core.
We are clarifying this with the Highs team but for the moment this is most probable.
Yep that was confirmed: no parallel MIP. SCIP has concurrency but not in the default configuration.
If you have to stay with free solvers, you can look at model formulation (and ask here). Playing with solver options, such as presolve and branching strategy, can help sometimes.
Finally, in some cases it may be possible to split the problem. For example, if the problem has several distinct scenarios. It could be as simple as fixing some choices. Then, run all scenarios in parallel.
I really appreciate everyone’s responses; thank you.
These last three options (bold) that are getting sent to highsmp were added by me to try to speed things up; the first three were told to me by the developer of the modeling system: “mip_rel_gap=0.005 tech:outlev=1 tech:timing=1 alg:parallel=on mip:mincliquetable=1 tech:threads=8”. This has been sufficient to get my model run time to just under 13 hours. All I have ever seen highsmp do during a run is occasionally go to two cores or, less often, three.
Changes to the model formulation is something that I’d consider out of scope because it’s really outside my domain expertise. A run only really corresponds to a single real-world scenario but it’s my understanding that I may be expected to run more than one scenario at once and obviously I have plenty of compute and memory for that.
I understand I have the option of building an AMPL version of more open-source solvers than just HiGHS (glpk is right out, however - vastly slower than HiGHS) to see if any of them handle this particular modeling problem better and whereas I’m interested in the kestrel/NEOS option, I am worried about the INFOSEC exposure.
Updated Highs docu: Parallel · HiGHS Documentation
You can play with other options, such as presolve effort, cuts, heuristics etc. For example, is the most time spent on finding/ improving feasible solutions, or improving dual bound? If feasibility is an issue, you can program your own heuristic and supply its solutions as warm start. In the other case you might just stop after a 20% gap.
If you only need to solve the model a few times, obtain trial licences for commercial solvers - can be added to community edition. Otherwise, solvers have flexible cloud licenses, and some solvers are cheaper than others.
I could indeed play with other options but without understanding the underlying modeling any better than I do - which is not very well - I’d be just flailing blindly. Yes, deviation from “mip_rel_gap=0.005” might make a difference but I don’t think I’m free to make that sort of change that may effect confidence in the result.
Early on in this process I experimented with free versions of this solver and that solver; CPLEX community edition (or whatever IBM calls it) for example is crippled such that it won’t solve a problem this big. And a problem I’ve been facing (and that has guided my course of action thus far) is that my client didn’t budget for either of the two commercial solvers that the modeling system is set up to use and therefore there’s not a lot of point in my burning hours toward using either.
Right now I’m trying to get Cbc/AMPL to build, just trying to see what another Open Source solver can do.
To the extent that AMPL abstracts away the solver interface such that it makes other commercial solvers usable, then I know from other work that things like AIMMS or XPress can be had for less money than those other two commercial solvers.
Inasmuch as I would like to try some sort of remote/cloud solver operation, it could not be the only thing I depend on and even at that, I’m not happy about the INFOSEC/COMSEC risks.
Commercial MIP solvers supported by AMPL include not just CPLEX and Gurobi, but also Xpress and COPT.
I do have some experience with running XPress and I know it can be had for much less than CPLEX/Gurobi money so that may be a test-worthy option. I won’t be using COPT in any capacity (INFOSEC problem, a la Tik Tok).
Martin here, another of the AMPL team - out of curiosity, have you tried implementing this model in AMPL or are you still running in PYOMO?
AMPL, for a large MILP, might be faster regardless of solver, but you’ll have to try it to know for sure.
Hi, Martin -
I don’t have much experience with any of this stuff; I’m dealing with a Python-based modeling system that I’m basically inheriting. I’ve been in touch with the modeling system’s original developer and it was he who first tried getting AMPL/HiGHS to work with Pyomo within the modeling system. I had tried and failed to get HiGHS to work directly; it’s my understanding that the incorporation of AMPL closes a data I/O gap. So there’s a boundary with the modeling system and its use of Pyomo on one side and solver on the other, and in this instance “solver” consists of AMPL/HiGHS (i.e., the “highsmp” binary made by linking mp with the HiGHS library file(s) that are created at HiGHS build time. I hope that explains the mechanisms sufficiently to be understood; again, I’m a n00b.
I don’t really have the ability to tear out Pyomo from the modeling system and replace it with calls to AMPL instead, and it’s not my understanding of how all this works that the CPU is sitting there burning cycles specifically on Pyomo so I’m not sure that there’s much value-add from trying to do so unless I’m mistaken (acknowledging that I may well be).