I’m trying to determine if AMPL is a good choice for solving the following kind of problem, if it is feasible then a pointer to an example that follows a similar format would be really appreciated.

A national political party is trying to determine which zip codes in the country should be targeted with national grant funding to elect their parties candidates. For each of the 100,000 zip codes and for each of the last ten relevant elections they have a spreadsheet containing around a million rows of data, like this…

Outcome Date as Int Zip F1 F2 F3

0 21914 90210 7534 3453 5534

1 23340 90210 7854 3456 5463

…

Where F1,F2 & F3 are factors like number of registered voters in zip, number of homes in zip, number of existing politicians from that party in that area etc. In practice there are more like 50 factors that they believe might be important. All of which are positive integer type data.

The outcome column is either 0 lost that election, or 1 won that election. What I am trying to do is ingest this large amount of data and then use it to obtain minimum and maximum values i.e. F1min, F1max, F2min, F2max, F3min, F3 max, such that if applied as constraints you would get the maximum number of 1’s in the outcome and EQUALLY IMPORTANT the minimum number of zeros.

The idea is that once these values are found they can be used to evaluate new areas that might hold elections in the future where depending on the min/max values of F1 to F3 the party will apply funding in the hope of a win, will decide that particular zip is not competitive for them and won’t spend money, or worst case apply funding and then loose.

I appreciate that this is a big problem and would be open to reducing the number of rows, by random sampling, or just throwing away older data. My hope is to get this model working with a smaller number of rows and then submit the whole thing to an online solver like NEOS which could hopefully handle more data. I’m using the Python interface at the moment, but would be happy to do it differently if that was a better approach. This is a hobby project, so I can’t pay for a commercial solver. Complete novice here, so I apologize if the answer should be obvious.

Thanks for any help,

Dan.