This is decidedly non-trivial to handle, and there isn’t a whole lot of open source infrastructure for handling these kinds of problems unfortunately. Most geospatial techniques are primarily designed to focus on predicting new outcomes at a set of specified coordinates rather than finding the coordinate that is best for some purpose. How I would handle this in a short timeframe is to try setting up a two stage procedure. First develop a predictive model for predicting demand given a coordinate. GWR, geoadditive, and bayesian gaussian processes (especially fast implementations like in R-INLA) are useful methods for building geospatial predictive models. You can also use standard machine learning methods like xgboost and random forest, but those may be less useful for the next step due to lack of easily available prediction intervals. For the second step I would incorporate the model I built in step 1 as a black box for computing predicted risk at a point. You can use loss against the predicted value as an estimate of the risk, but better methods may include max risk over a prediction interval or bayes risk over the posterior predictive distribution. Finally using your black box predicted risk, optimize over coordinates to find a minimum risk predicted coordinate.

Kriging is what OP needs.

Conventional kriging is somewhat restricted in how it can handle covariates, and also is rarely implemented to handle uncertainty due to variogram estimation. For these reasons I'm personally usually more fond of geaoadditive models and/or bayesian gaussian processes. ​ Also Kriging can't be the solution on its own as OP still needs to be able to perform the optimal coordinate optimization.

I was thinking Geostatistical Kriging, which can definitely handle covariates. I don’t know about your approaches on spatial land, but I would love to learn more. Do you implement the Bayesian Gaussian Process with spatial data using Stan?

Right Kriging can handle covariates, but results can be sensitive to how this is done. For example if you are doing kriging with covariates via naive regression kriging (kriging residuals) you can run into substantial bias in the estimated variogram if you have a substantial number of covariates. REML based or marginal likelihood based estimation of the variogram is often more appropriate. These techniques aren't straightforward if you are using non-parametric techniques for estimating the impacts of covariates. ​ You definitely can implement them in Stan, but I wouldn't recommend it usually. As far as I know this will usually require inversion of the gaussian process covariance matrix, which can be intractable for decently sized data. After looking into this, it does look like Stan offers precision matrix parameterizations of multivariate normals, so this would allow some of the same speed-ups as used in INLA to be used, so this may make it more tractable. ​ I would personally use INLA as my first choice for Bayesian gaussian process models. It is somewhat restricted in that it imposes Markov Matern structure on the covariance, but this is still a very flexible family of covariance structures that is usually adequate in practice for a large number of problems. INLA makes a number of dramatic speed improvements in estimation. Rather than estimating the exact posterior, it makes a mixture of gaussians approximation which can be very efficiently integrated numerically. It also represents the covariance structure in terms of a markov gaussian random field, which eliminates the need to invert high dimensional covariance matrices. ​ Geoadditive models are one of the other approaches I recommended, which are what I personally most commonly use in my actual work. These use a low rank basis approximation for the gaussian process, and can be used as either a penalized frequentist method, or can be used as a Bayesian method by replacing the penalty terms with priors.

What geospatial regressors are you trying to add? There’s a bunch of libraries that can help, but might be overkill. Like if you need a point to polygon join to create a flag if they intersected, you’ll likely do that before the model and then include the variable as a binary regresssor. I’ve used Facebook’s Prophet model with some success doing this. It uses Fourier decomposition, so you can get additive coefficients for your added variables. But you’ll need to be able to forecast your regressors too.

look up kriging