| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by strin 1953 days ago
	lol they didn’t open source the model weights

5 comments

buildbot 1953 days ago

If I did the math right it would be 3.12TB of weights, maybe they are trying to upload it to gdrive still. (/s, probably)

link

schoen 1953 days ago

They released more data than that for their Google Books n-grams datasets:

https://storage.googleapis.com/books/ngrams/books/datasetsv3...

(I don't remember exactly how much it is, but I remember that the old version was already in the terabytes.)

link

lifthrasiir 1953 days ago

Another example of Google giving much data away is 50 trillion digits of pi [1], which contains about 42 TB of data (decimal and hexadecimal combined).

[1] https://storage.googleapis.com/pi50t/index.html

link

dheera 1953 days ago

The Waymo open dataset is about 1TB. I don't think releasing a 3TB dataset would present a technical challenge for Google.

link

londons_explore 1953 days ago

Even a 3PB model would be very doable for Google...

link

pradn 1953 days ago

The daily upload quota for a user is ~750 GB. It'll take a few days to upload that much data to Google Drive!

link

dekhn 1953 days ago

Google Cloud Storage. The files could be dumped as tfrecord in a bucket with "requester pays". So anybody could reproduce it using the open source code, by paying for the costs incurred to move the data from GCS to the training nodes.

link

shepherdjerred 1953 days ago

Weights are just numbers (probably floats?), right?

This model has 3.12TB of floats??? That's insane. How do you load that into memory for inferencing?

link

exikyut 1953 days ago

Use x1e.32xlarge on AWS with 3TB of RAM. Just $12,742/mo - https://calculator.aws/#/estimate?id=7428fa81192c57087ac8cdf...

Alternatively order something like the HP Z8 with 3TB RAM configured, which is only $75k - https://zworkstations.com/configurations/2040422/

It's interesting. It would take ~six years for the Z8 to break even compared to AWS, but traffic into and out of the machine would be $0, and I don't think you're running directly on the metal with AWS, so performance would probably be a bit higher. And then there's storage - I configured, uhh, 120TB of a mixture of SSDs and HDDs. I'm not even going to try and ask AWS for a comparible quote there.

I may or may not have added dual Xeon Platinum 8280s to the Z8 as well. :P

link

tjbiddle 1953 days ago

When you're spending that kind of money on a machine, there's no way you're paying retail price. Sales reps would give you a significant discount.

Also - think you meant 6 months, not 6 years anyhow :)

link

exikyut 1953 days ago

Interesting. I'm very curious... 20%? 35%?

And I did mean 6 months, woops. Didn't even notice...

link

jsnell 1953 days ago

> It would take ~six years for the Z8 to break even

Do you mean six months?

link

exikyut 1953 days ago

Oh *dear*. I definitely tripped over there, and I didn't even notice.

Yup.

link

high_byte 1953 days ago

Z8 sounds like fun. but I might just buy two teslas (roadster and X, or a cybertruck) and a gaming PC. :D

link

visarga 1953 days ago

hate to break the party but this model only loads a small part of itself in RAM when inferencing

link

exikyut 1953 days ago

That's a good thing. Less completely means more energy for interestingness, and less expense means more accessibility.

link

swirepe 1953 days ago

(They are definitely going to exceed their storage quotas.)

I want to see how well weights for these models compress, but it will take me some time to run this code and generate some. I'm guessing they won't compress well, but I can't articulate a reason why.

link

wisty 1953 days ago

If weights compress, they have low information, which would suggest they're either useless or the architecture is bad.

link

notretarded 1953 days ago

In the source code it says "I have discovered truly marvelous weights for this, which this header file is too small to contain"

link

verdverm 1953 days ago

data is the new oil, what's the analogy for the data industry's impact on society akin climate change?

link

ALittleLight 1953 days ago

Surveillance Capitalism

https://en.wikipedia.org/wiki/Surveillance_capitalism

link

selfhoster11 1953 days ago

Social Cooling (https://www.socialcooling.com/).

link

coef2 1953 days ago

Is this because they are afraid of the model misused, like used for generating fake reviews? It is frustrating that I've been hearing great news on NLP but am able to try none of them myself.

link

briga 1953 days ago

It's because the model weights are the valuable thing here. The fancy new architectures are nice and everything, but transformer models are a dime a dozen these days. Seems like they're using this as an example to point at and say "Hey, look at us, we support open source!", whereas unless you're willing to go ahead and spend a small fortune on compute (possibly using their GPUs), these models are somewhat useless.

link

JZL003 1953 days ago

hah! yeah that's what I was looking for too

link