Very high memory use

Hi,

I am developing a big linear optimization model for energy system modelling. I am running it on a machine with 126 Gb of RAM and reaching the limit of its capacities (solving my problem uses up to 100 Gb and I still need to increase the size of my model).

However, when I look at your FAQ, I find the following:
“AMPL’s memory use for a linear program can be estimated crudely as 1000000 + 260 (m + n) + 50 nz bytes, where m is the number of constraints, n the number of variables, and nz the number of nonzeroes.”

When I try to compute my case based on the information in the log file, I get 11.35 Gb, which is 10 times smaller than what I observe in practice. Can you tell why and if I can change that behaviour?
Also, can you tell me how I could estimate the additional memory needed if I increase the size of my model?

Your screenshot shows about 22 Gb of memory used by the genmod, merge, and collect phases, which process the model and data to create a representation of the complete optimization problem in memory. However, from only this information, it is not possible to tell which variables and constraints are most responsible for the memory use, or how much memory was used for the presolve, output, and possibly other phases. Thus it is not yet clear what is causing 126 Gb of RAM to be used.

Can you post the entire output from AMPL (and from the solver, if any)? That will be very helpful in determining the cause of the very high memory use. If the listing is very long, you can store it in a file and then upload the file.

Thank you for your reply.
You can find attached the log file of this run.
log.dat (24.2 KB)

The log indicates a successful optimization run.

The listing for AMPL shows that it uses about 34GB. Over 60% of the variables and constraints are eliminated by AMPL’s presolve phase; almost 12GB is associated with presolved variables and constraints, and that could maybe be reduced significantly by changes to the formulation.

But since AMPL is using much less than the 100MB that you are observing, it must be that the greater memory use is in CPLEX. Indeed, the message “Total non-zeros in factor = 6116241945” already suggests that memory use will be high, since there will be somewhat more than 12 bytes needed for each nonzero. Thus, I would recommend first trying to reduce CPLEX’s memory requirements. Here are four things that you can try:

  1. Add the option memoryemphasis=1 to the cplex_options string. This tells CPLEX to compress some data to reduce the memory used. (You might see some increase in computation time, however.)

  2. Try ordering=1 and also ordering=2. Each of these tells CPLEX to choose a different method for factoring the main linear system solved at each barrier iteration, and there is a chance one of them will produce a sparser factor — using less memory than CPLEX’s current choice, which corresponds to ordering=3.

  3. Add the option aggregate=0. This tells CPLEX not to eliminate equality constraints by substituting variables out of the problem. As a result a larger problem will be solved, but possibly the factorization will be sparser.

  4. Run CPLEX separately from AMPL. Start your AMPL session like you did before, do not type “solve” — type this command instead:

    write benergy;

    Then quit out of AMPL, and in your command window, use a command like this (all on one line) to run CPLEX:

    cplex energy -AMPL baropt predual=-1 barstart=4 comptol=1e-5 crossover=0 timelimit=64800 bardisplay=1 prestats=1 display=2

    (If you previously added any options to your cplex_options string, you need to also add them to this command.) After CPLEX finishes, start AMPL again, and type all of the commands you entered before — but now, when you get to where the write command was, enter this command instead:

    solution energy.sol;

    Then you can proceed with commands to display or save results.

Try each of these ideas separately. If more than one appears helpful, then you can consider using two or more together.

Thanks for your detailed answer.

Indeed, the log indicate a successful optimization run as this is a case where it still works but I need to increase the accuracy of my model which increases the size of the problem and there it gets too large for my machine.

I tested the different options you propose on a less accurate version of my model but that has basically the same structure. Here are the results :

  1. with memoryemphasis=1: 4.110832 Gb
  2. ordering option:
    2.1 with ordering=1 : 4.115116 Gb
    2.2 with ordering=2: 4.11908 Gb
  3. with aggregate=0: 4.155408 Gb
  4. with generation of a energy.nl file: 3.037204 Gb (barrier time = 410.42)

To be compared with the reference run that uses: 4.155492 Gb.

As you can see, none of those options gives a significant reduction in memory usage. Except maybe the last one but to the best of my knowledge, it is not feasible with the amplpy interface which I am using. (I had to try it “manually”).
I didn’t try those options combined as they didn’t give good results one by one.

Do you have any other advice to reduce the memory usage of my model?

I cannot think of any other option settings that would reduce CPLEX’s memory use in this case. For every problem, there is a minimum amount of memory that is needed to solve with CPLEX, and you may already be at that minimum.

There might be a way to get the benefits of running CPLEX outside of AMPL (as in test 4) even when using amplpy; I could ask an expert on amplpy about that.

But first, you mention that you have run these tests on a different version of your problem. You originally mentioned needing 100GB, but this version needs only 4GB (or 3GB in test 4). That is quite manageable. So now you can test an instance of the same version that is twice as big; then, by comparing this new test to the smaller one that takes 4GB, you can get an idea how the memory needs are going to scale up. To get some help with that, you could post the complete logs (like the one in your log.dat file previously) for both the test that takes 4GB and the new, larger test.