| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tasdfqwer0897 980 days ago
	Hey I work at Adept and helped make this! Happy to answer questions. The thing I think is especially neat/notable is how simple you can make the model architecture while still getting good performance. I expect we'll continue to see bits of these models get deleted in the next few years Note that you can get the model weights on HuggingFace here: https://huggingface.co/adept/fuyu-8b

8 comments

brianjking 980 days ago

First off, absolutely incredible work, congrats and thank you.

Secondly, do you anticipate Fuyu being made available for commercial access or will it remain NC?

link

JimDabell 980 days ago

What’s the situation with the license? Your blog post says you are open sourcing it, but it’s currently only available under a non-commercial license instead. Is an open source release forthcoming?

link

coder543 980 days ago

Yeah... in the blog post, they do explicitly mention "cc-by-nc", which I find disappointing.

Anything that Adept is "excited to see what the community builds on top of it" would only serve Adept and no one else! What incentive does the community have to build on top of Fuyu, when the community can't benefit from its own work? If Adept wants to benefit from word-of-mouth discussion of their models and from community contributions that make those models work better, as has happened dramatically with Llama 2, then they need to give the community the opportunity to benefit too.

Also weird: if you look at the tags on Hugging Face, you'll see it is listed as "cc". This comes from the README[0] metadata. "cc" is not really a license.

[0]: https://huggingface.co/adept/fuyu-8b/blob/main/README.md?cod...

link

schleck8 980 days ago

It's open source by their definition, that is source available (open). Everyone always thinks the term open source is protected in any way while the entity that has established the commercial usage aspect is the Open Source Foundation. And noone is forced to abide by their ideology

FOSS meets the commercial usage requirement much better. Otherwise the term FOSS would be redundant.

link

mandelken 980 days ago

You can download the weights on Hugginface.

I believe the copyright on AI model weights in the US is not fully established, but so far it has been held that a list of numbers can not be copyrighted, so likely the same applies to model weights. Note that you don't have to enter into an agreement with Adept to use the model.

Alternatively, use and download the weights in Japan that has explicitly no copyright on AI models.

link

ansk 980 days ago

> a list of numbers can not be copyrighted

Any digital object can be represented as a list of numbers (this is precisely the origin of the term digital). Since there is clearly precedent for copyrighted digital objects (media, software, etc), reducing something to "a list of numbers" is not a useful distinction in regard to copyright law.

link

MattPalmer1086 980 days ago

IANAL but as far as I remember, you can't copyright a list of objective facts, for example a phone book containing a list of phone numbers.

Model weights are clearly not in that category. Happy to be corrected if I misremember.

link

outofpaper 980 days ago

Model weight are akin to markov chains and compressed data. They are direct representations of the data they where created from in the same way that markov chains are created from hidden markov chains and Zipped files are created from files.

Zipping a file does not grant the copyright protection of the zipped output beyond the copyright of the original file.

Moreover the American federal registrar has officially stated that AI generated artifacts are not eligible for copyright https://www.federalregister.gov/documents/2023/03/16/2023-05....

link

startupsfail 980 days ago

If you take some copyrighted data, a set of books, for example. And count words in these books and then plot a distribution of top 100 word frequencies. The copyright for that new image would belong to you.

link

schleck8 980 days ago

I highly doubt that any of this will hold up infront of a court. For intellectual property not just the result is important but also the creation process, and there is enough work going into the data science here

link

zan2434 980 days ago

Hey! Awesome work. It seems like in theory this encoding scheme should enable the a model like this to generate images as well, by outputting image tokens, is that right?

link

abrichr 980 days ago

Thank you for the release!

What can you tell us about this:

> Our internal models (based on Fuyu) have extra capabilities related to our product. In particular,

> 1. They can reliably perform OCR on high-resolution images

> 2. They can do fine-grained localization of text and UI elements within those images

> 3. They can answer questions about images of UIs

Is this just a matter of additional fine tuning, or are there architectural differences?

link

amks 980 days ago

Even with experiments with just adding additional fine-tuning, we've seen models gain these capabilities!

link

Q6T46nT668w6i3m 980 days ago

Neat idea! Are the batches encoded as tokens into the input sequence? This is something I really like about the multi-modal PALM papers since it enables the multi-modal tokens to be referenced.

link

ekelsen 980 days ago

Image patches are projected directly into an embedding that goes into the decoder Transformer. The same thing could be done for audio.

link

saran945 980 days ago

Hi, Will it work for html/APP UI screenshots, Have been trained using UI screenshots ? Thank you

link

visarga 980 days ago

Do you offer paid API access to larger models?

link

acanb 980 days ago

can you guys launch a web gradio demo until the transformers PR gets approved? i'd like to play around with the model

link