| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jcahill 2163 days ago
	GPT-3 is a neat party trick. But the things that'll be done with web archives* in the next 20y will make it look like the PDP-8. ~love, a web archivist * GPT-3 is trained on one

4 comments

ypcx 2163 days ago

The transformer model as presented in GPT-3 may be a few tweaks away from a human-acceptable reasoning, at which point we may realize that human brain is just a neat party trick as well. This may come difficult for some people to internalize, especially those who understand the technology in depth. Because it means that the medium of our reality is the consciousness.

walleeee 2163 days ago

Was this comment generated by GPT-3?

Naracion 2162 days ago

I doubted that as well, but I don't think it is--at least it's not a simple copy paste. There's an emphasis on _is_ in the last sentence which I don't think the algorithm could have generated.

However that makes one wonder if it can also learn to generate emphases, and if so, how would it format? With voice generation it can simply change its tonality but with text generation it has to demarcate it in some way--does the human say "format the output for html", for instance?

sytelus 2162 days ago

You are confusing pattern matching with reasoning. If your brain was replaced by GPT-3 model and you were cast away on a distant island, I highly doubt you will be able to perceive, plan and prosper during your survival against all the calamity nature would through at you.

mamon 2162 days ago

To be honest, most city-raised humans wouldn't be able to survive on a distant island as well.

Tarean 2162 days ago

The transformer model in GPT-3 has a short context window and no recurrence. Without some significant architecture changes that is a fundamental limit on the problems GPT-3 can solve.

lucidrains 2162 days ago

https://arxiv.org/abs/1807.03819

visarga 2163 days ago

> Because it means that the medium of our reality is the consciousness.

I agree. The environment - as the source of learning and forming concepts, is the key ingredient of consciousness, not the brain.

yomly 2163 days ago

I don't fully understand what you're getting at here...

Basically the brain and "consciousness" isn't as fancy as we think?

lucidrains 2163 days ago

Exactly.

h0p3 2163 days ago

No pressure: feel free to ignore me, please. Would you mind elaborating? I'm interested in what you have to say (and, of course, feel free to say it privately if you prefer). I would like to even hear your dreams, wild speculations, or gut feelings about the matter.

jcahill 2163 days ago

Sure, what do you want to know?

I currently work on synbio × web archival.

Some of us are cooking up futuretech aimed at storing all of IA (archive.org) in a shoebox. Others are working on putting archival tools in more normal web users' hands, and making those tools do things that people tend to value more in the short-term, like help them understand what they're researching, rather than merely stash pages.

My ambitions for web archives are outsized compared to other archivists, but I'm fine with that. I'm looking beyond web archives as we currently understand them toward web archives as something else that doesn't quite exist yet: everyday artefacts, colocated and integrated with other web technology to an extent that they serve in essential sensemaking, workflow, and maybe security roles.

Right now, some obvious, pressing priorities are (a) preserving vastly more content and (b) doing more with the archives themselves.

A: The overwhelming majority of born-digital content is lost within a far narrower time-slice than would admit preservation at current rates, and data growth is accelerating beyond the reach of conventional storage media. So, for me, the world's current largest x is never the true object of my desire. I'm after a way to hold the world that is and the world to come.

Ideally, that world to come is one where lifelong data stewardship of everything from your own genome to your digital footprint is ubiquitously available and loss of information has been largely rendered optional.

This, of course, requires magic storage density that simply defies fundamental limitations of conventional storage media. I'm strongly confident that we're getting early glimpses of the first real Magic contenders. All lie outside, or on the far periphery of, the evolutionary tree that got us the storage media we have today. For instance, I'm running an art exhibition that involves encoding all the works on DNA.

B: Distributed archival that comes almost as naturally as browsing is well within reach, and with that comes some very new potential for distributed computation on archives. One hand washes the other.

One important thing to realize here is that, in many cases, you can name a very small handful of individuals as the reason why current archival resources exist. GPT-3 is cracking the surface by training on data produced by one guy named Sebastian, for instance.

…i'm sorta tired and have to respond to something about every twitter snapshot since June being broken, though, so I'll pick this back up later.

greyface- 2163 days ago

This is an interesting thought. GPT-3 used 45TB of raw CommonCrawl data (which was filtered down to 570GB prior to training). The Internet Archive has 48PB of raw data.

GreenHeuristics 2162 days ago

That 48PB is mostly just old video game roms and isos though

scoot_718 2163 days ago

Hopefully in a way that secures some funding for those making archives of the web.

jcahill 2163 days ago

I'm running the Coronavirus Archive. Largest thematic archive on the pandemic, since January. I'm also teaching community biolab techniques to people in parts of the world without ready access to commercial COVID-19 test kits, on all but zero resources at this point.

I could use… what's the word? I think it's more funding.