Hacker News new | ask | show | jobs
by rabbits77 1857 days ago
There is nothing in that press release that could not have been done in the 1980s with Prolog.

Yeah, it’d have been more code but you would not have needed to destroy a forest to train the thing.

This is the NLP trade off of the 21st century. The code is easier to write but the model is completely opaque, and you need to really burn a lot of electricity to make it work.

1 comments

This is totally false, I dare you to write anything close to e.g. BERT with Prolog.
> This is the NLP trade off of the 21st century. The code is easier to write but the model is completely opaque, and you need to really burn a lot of electricity to make it work.

This is basically a meme now. We actually have a pretty good understanding of how the models work. In fact that understanding is how you can do things like build chatbots that don't spew hate.

Also the electrical cost of ML training large language models is indeed high (e.g GPT-3 has 175B params and is estimated at 190,000 kWh to train on GPUs). But the folks who pay the cost (Basically OpenAI, Google, MSFT, Facebook, Amazon) are incentivized to make that go down (TPUs are way more efficient than GPUs), and they are incentivized to do it infrequently because it costs $$$.

FWIW Google's datacenters are also technically carbon neutral. I know that's not great because carbon credits don't have the impact that folks think they have, but there is definitely a difference in ecological impact from datacenter electricity and other kinds of energy usage (e.g cars all burning fossil fuels).

Okay also let's compare to bitcoin, which is the real ecological disaster if we want to talk about inefficient software: ~387,096,774 kWh PER DAY. _and_ incentivizing things like cheap coal, and miners are definitely not using their crypto wealth to purchase carbon offset credits :(

I mean yes. But it is a funny example to choose to illustrate the power of a NN approach. They're talking about mountains--an entity that has very concrete and definable attributes (e.g., height). And the rest of the examples are similarly dealing with semi-structured data that could theoretically be represented in RDF or something like that.

There's been a bit of discussion on HN lately about the effectiveness of sophisticated models vs. just good metadata.

The better wager is: I dare you to write and train up a sophisticated real-time neural network model that can interpret human language and provide reliably useful contextual search results with the compute power and memory constraints of the 80s.
Why would anyone take that wager? I see no reason to believe that's possible with either Prolog or NNs when you're restricted to 80s hardware.
Exactly my point. The wager was rhetorical