Hacker News new | ask | show | jobs
by koolala 664 days ago
You can't imagine it :( Open data :(

I believe our world fights to destroy ideas like this because our economy drives our entire life.

1 comments

> Can you imagine

>> No

>>> You can't imagine it

You haven’t articulated the idea you claim the “world fights to destroy”. (Just throwing around the word open without elaboration isn’t an idea.)

I’m not sure what they’re talking about, but I’ll throw my hat into the ring. Copyright and other such systems are destroying any chance that we, as humanity, have of letting LLMs progress in an open and transparent manner. We have to hide the training data and make the weights a black box because of such antiquated notions such as copyright. While I am willing to permit some level of exclusivity with creative works, 100+ years is unreasonable and stagnates human creativity even outside of ML tasks. In the 19th century, I could take a book I was raised on and write my own fanfiction, and because that book would have been public domain by the time I was an adult I could add onto the work and the other fans of the previous work can build upon it with me. We see this with Sherlock Holmes for instance. If I wanted to publish a book set in the world of Harry Potter I’d need to wait for JK Rowling to croak, and then wait another 70 YEARS.

We need dramatic reforms on copyright, as we’ve really let corporate interests crowd out our rights to human culture and ideas. While I alone cannot decide what we as a country should find reasonable, I can say I find 20 years + 5 years extension is perfectly reasonable and that corporations should have never been able to pay off politicians to get what they wanted. Let alone Sonny Bono, that bastard, signing in bills that specifically benefited him.

So, to reiterate, the idea I feel that corporations want to destroy is the idea that we, as a people, have rights to the works that form our popular culture and that no one man, let alone a faceless corporation, should be able to profit from a singular work for hundreds of years.

Data that is accessible. Knowledge. Truth. With an AI trained on it that can expose it in any expert / layman terms into any human language.
You’re undermining the case for an open source LLM by stating things fully-proprietary models do.
They don't make the source data accessible :(
> they don't make the source data accessible

No. But you haven’t articulated why making everyone’s Facebook chats public is a net good. What does opening that data up confer in practical benefits?

Given what we know about LLMs, one trained only on public-domain data will underperform one trained on that plus proprietary data. If you want source data available, you have to either concede the "open" models will be structurally handicapped or that all data must be public.

You think Llama is trained on peoples private messages? :( That isn't good...