Hacker News new | ask | show | jobs
by samvher 1325 days ago
This seems to be a thing with weather models more generally. Somewhat relatedly, I've spent quite a bit of time evaluating weather models for use in India and Africa, and while predictions are easy to find, validation results for the predictions are very hard to find. And when you do find them, the results are pretty poor, with many models performing worse than if you would say "predict temperature on date X to be the average observed temperature on the same date in the past 10 years". But people still sell (and buy) these predictions!

Weather predictions seem to be accepted quite uncritically. Perhaps people have a lot of confidence in the smart people that built these predictions (a bit like how AI predictions can sometimes be accepted uncritically).

3 comments

100% agree. Scientists and engineers all know that you must provide validation results, accuracy/uncertainty calculations, etc. or your data is just a pretty guess. I think weather forecast models are so commoditized and useful for laypersons that we've UX'd all of the complexity (scrutinizing the data) out of the product. The most scrutiny I ever see are people discussing what "Probability of Precipitation" values really mean.

My grad thesis advisor encouraged me to actually get the Environment Canada models and learn how to run them (they're in FORTRAN). I could never make them spit out data consistent with what EC publishes. That's probably on me, but it was a real eye-opener to this whole domain's complexity.

I've been working with weather models for 10 years and I often get asked "How accurate is X?" or "Which model is more accurate?" Many people think "accuracy" is a single number or a single thing - it is more complex than this and depends on your needs.

This chapter on Numerical Weather Predictions [0] is great, especially the section on "Forecast Quality and Verification" (p777). The eye-opener for me was "Binary/Categorical Event". An example of a binary event is rain, one model could predict rain correctly but a second model might not predict the rain at all. This doesn't mean the second model was completely wrong, it still predicted the rain but it predicted the rain passing further to the south.

[0] https://www.eoas.ubc.ca/books/Practical_Meteorology/mse3/Ch2...

I've also noticed some model are better than other at predicting one phenomena while other models might be better in certain regions. For example, many people report that Canada's GDPS is better at higher latitudes whereas NOAA's GFS is better at equatorial regions.

One final note, just because someone is solving an WRF model without verifying the results, doesn't mean it's wrong. Many numerical techniques and physical models within WRF have been validated analytical and experimental models. But it is also true that someone can naively setup a WRF model that gives bad results.

I use a 900m WRF model that predicts the wind shadow around an island and we use it to find the best beach for a picnic - and it works. But this same model predicts the general pattern of rain but it doesn't get the start and stop time of rain correct.

People get fixated on accuracy as a single thing and use it as a single basis for argument but to take a quote from the chapter [0] above "One of the least useful measures of quality is forecast accuracy" (ref. p777, Forecast Quality and Verification, third paragraph).

> other models might be better in certain regions

The US Navy's COAMPS model is good for littoral regions.

Meteoblue was dramatically more accurate in Chamonix last spring than the GFS.
You have to be careful you aren't comparing apples to oranges. You might be looking at the Meteoblue MOS (statistically corrected) predictions which might be based on their regional weather simulation. This regional simulation might be nested in a larger global model, probably from ECMWF. If you compare this ECMWF model to GFS, then you are comparing apples with apples.

I find global models like GFS are great for understanding the large scale weather systems. The regional high-resolution models, which are usually nested in a global model, give better definition of local weather phenomena like wind shadows or cooler temperatures in valleys.

Dues to averaging, weather simulations usually have a bias error in temperature predictions. These errors are corrected using statistics (look up Model-Output-Statistics) but is hyper-local, i.e., you loose the big picture. This is probably what you're looking at with Meteoblue.

Given this is in the SF bay there a number of high quality observations that you can use to validate the forecast skill (unlike India and Africa). I have not bothered doing this here since… well that’s too much like my day job.

I’m always excited to see new forecast products, generally. If I were to guess (as an above comment did) it looks like they are applying some dynamic downscaling on top of either a custom WRF model (expensive and complicated) or more likely already available weather model data like the HRRR, which still would represent a 10x resolution increase.

I’m more curious what the refresh rate is. Anyone can get a super accurate forecast for the next 3 hours that takes 10 hours to run, but at that point it’s no longer a forecast by the time the data is available.

I still think that windy has set the standards as far as modern weather visualization goes. Not saying everything has to be particles but other things (like the inclusion of isobars) is really clean and not trivial to execute.

Either way this has definitely piqued my interests and I will be keeping an eye on it, their advisory board looks legit (at least in the meteorology end)

The website claims to be using DL which may mean less of a model-centric approach? The expertise of the people at the top of the organization, on this problem, seems a little thin, TBH. And, no stated validation results at all? Without such details, this is just marketing.

It would be interesting to see how this behaves for longer prediction times and across a range of difficult forcing conditions off the ocean in the BA.

I agree, this generally left me feeling skeptical. I know of Luca Delle Monache on the advisory team, through colleagues who have researched under him at Scripps and they spoke highly of him. But yes, there is a lot left to the imagination here.

With regards to the sfbay specifically I used to work with a fairly high resolution wind model for the bay (this was a more traditional dynamic based simulation) and it worked pretty well overall, but every time a storm blew through it would crash. This ultimately had to do with the relatively steep terrain in the bay specifically (and the physics configurations we were using in the actual model).

Even if they are using DL they still need initial and boundary conditions. As I said there are a ton of weather stations around so I could imagine a DL type approach that looked at terrain elevation, and recent + historical observations to initialize a forecast, but I still imagine that boundary conditions would have to be provided by nesting this in a larger model somehow. Then again, I'm not a DL expert at all so there are probably some newer stuff in this field that I'm just out of date on.

Its really expensive to run your own dynamic forecast model, at a refresh rate acceptable for an actual forecast, at this resolution. That's why I suspected its taking existing weather models and downscaling them with DL techniques, but I can't really know just by looking.

(For clarity, I was referring to the company leadership proper, not the advisory team.)
We are currently integrating with Forecast Watch a 3rd party that analyses and compare various forecasting systems [1]. Please stay tuned until we integrate our APIs. I will be updating this thread when it is ready.

[1] https://www.forecastwatch.com/