| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jjice 1144 days ago
	Does anyone have any resources they recommend for just understanding the base terminology of models like this? I always see the terms "weights", "tokens", "model", etc. I feel like I understand what these mean, but I have no idea what I need to care about them for in open models like this? If I were to download an open model to run on my machine, would I download the weights? I'm just ignorant in the ML space I guess but not sure where to start.

7 comments

visarga 1144 days ago

Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this? Get those personalised explanations and enjoy its unlimited patience.

I have felt the same in the past, related to a completely different topic. I know how it feels, it's like people are not saying things what they are, just using weird words.

"weights" - synapses in the AI brain

"tokens" - word fragments

"model" - of course, the model is the AI brain

"context" - the model can only handle a piece of text, can't put whole books in, so this limited window is the context

"GPT" - predicts the next word, trained on everything; if you feed its last predicted word back in, it can write long texts

"LoRA" - a lightweight plug-in model for tweaking the big model

"loss" - a score telling how bad is the output

"training" - change the model until it fits the data

"quantisation" - making a low precision version of the model because it still works, but now is much faster and needs less compute

"embedding" - just a vector, it stands for the meaning of a word token or a piece of image; these embeddings are learned

rodoxcasta 1144 days ago

But, this isn't a bad ideia when you don't know even the basics? Because you wouldn't be able to separate genuine information to subtle or not so subtle hallucinations.

It's like generating code in a language that you know nothing about. You should check for bugs, but you can't.

2devnull 1144 days ago

The first thing to learn is you can’t trust the internet. From that you’ll know not to trust gpt. If you are prone to trusting things blindly, without doing your own research or verification, you have far bigger problems than gpt “hallucinations” (frankly a terrible terminology).

digging 1144 days ago

I find "hallucinations" to be pretty apt. What works better in your opinion?

starfallg 1144 days ago

The neurological term for it is "Confabulation", which is a lot better than "Hallucination" as used in AI.

Confabulation is the unintended generation of false memories.

Hallucination is false perception.

Clearly, the phenomenon we are seeing with LLM researchers call Hallucination better fits Confabulation.

szundi 1144 days ago

Sometimes it helps when the audience gets the meaning of a word. Confabulation is not really popular among non-native english speakers, I am sure.

digging 1144 days ago

I don't actually think either term is more precise than the other when we're talking about LLMs, which aren't human brains. It doesn't have either memory or perception in a way that we do.

moomoo3000 1144 days ago

I think the horse had left the barn on this one.

nborwankar 1144 days ago

“Confidently presented bullshit” is probably much more accurate. Added benefit no new vocabulary terms :-)

jstarfish 1144 days ago

Lies. Bullshit. Con artistry.

It's not perceiving reality incorrectly, it's presenting wholesale fiction as fact both coherently and with absolute confidence. It even forges supporting documentation ad-hoc.

GPT is not a poor schizophrenic suffering from delusions or innocuous "hallucinations." It is the world's most advanced liar.

windsignaling 1144 days ago

> Lies. Bullshit. Con artistry.

These are worse as they imply the thing generating the words knows the truth and purposely says something else.

An LLM is just doing next token prediction. It's a mathematical process. It's not trying to "hide" the truth from you.

wingspar 1144 days ago

For me, hallucination is better.

Lies, BS, and Con artistry all require conscious motive and intent. Thats a bridge to far, for me, in ascribing ‘intelligence’ to these models.

Hallucination, to me, conveys ‘seeing things (facts) that are not there’. To the extent the models are ‘perceiving’, they ARE perceiving reality incorrectly. Granted, I expect many times it’s because the source of the model training data are, at best, just wrong or are lying.

digging 1144 days ago

Those are very inaccurate descriptors. A lie is an intentional deception, which is impossible for GPT. It "believes" that it "knows" something about the world, which happens to have been made up wholesale by its "subconscious" (obviously I know it's not a human brain). That is pretty much a hallucination by definition, applied to a non-human "intelligence".

Besides,

> it's presenting wholesale fiction as fact both coherently and with absolute confidence

That is not in any way distinct from perceiving reality incorrectly. It is a symptom common to both skilled lying and hallucination.

newswasboring 1143 days ago

In my opinion people are way more afraid of hallucinations than they should be. You are not asking it to solve world hunger, this is basically like asking it to summarize Wikipedia articles. At least with GPT4 it doesn't hallucinate on basic things. I am learning typescript with it, and it hasn't given me wrong answers to direct questions yet. If you are too worried about hallucinations use something like phind.com which will give some sources.

hansvm 1144 days ago

Anyone can evaluate whether it's giving you a self-consistent set of statements, and the additional words it spits out are helpful for a traditional search for alternative sources.

IMO, so long as you're aware the information is often subtly wrong, it's not that different from, e.g., physics classes progressively lying to you less to allow your brain to build a framework to house the incoming ideas.

babyshake 1144 days ago

I think of the good things to get a sense of with ChatGPT is the types of areas where it is most and least likely to confabulate. If I asked it for an ELI5 about key concepts relating to how LLMs work, I would be highly confident it would be accurate. When you start asking about truly esoteric topics, that's when it often starts completely making things up.

vibrolax 1144 days ago

I like the term "confabulation". A hallucination is an artifact of an intoxicated or malfunctioning brain. In my experience, confabulation is a common occurrence in normal brains, and can occur without intention. It's why humans make such poor witnesses. It's how the brain fills in the blanks in its senses and experience.

cogitoergofutuo 1144 days ago

> Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this?

I do not use ChatGPT as a search engine. Its ability to confidently hallucinate consistently places it much below a human expert on any topic that I care to understand correctly.

CamperBob2 1144 days ago

That attitude is going to cost you. You'll have no choice but to abandon it at some point, as the LLM implementations get better. The improvements in GPT4 over 3.5 alone are enough to dispel a lot of my own initial skepticism.

cogitoergofutuo 1144 days ago

> That attitude is going to cost you.

I don’t think it will cost me much to not use the explicitly-not-a-search-engine thing as a search engine.

Which LLM will you use to verify that ChatGPT is more knowledgeable than human experts on a given topic?

CamperBob2 1144 days ago

The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine. No matter what tool you use, the responsibility for ensuring accuracy is ultimately yours. Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.

They are both insanely powerful tools, and like most insanely powerful tools, the hazards are considerable.

inkysigma 1143 days ago

Without a search engine, how am I supposed to weigh the accuracy of an LLM? How am I supposed to take responsibility for ensuring accuracy?

I also think people who say that search engines lie are seriously overestimating the amount of lies on returned by a search result. Social media is one thing but the broader internet is filled with articles from relatively reputable sources. When I Google "what is a large language model" my top results (there aren't even ads on this particular query to really muddle things) are:

1. Wikipedia

Sure this is the most obvious place for lies but we already understand that. Moreover, the people writing the text have some notion of what is true and false unlike an LLM. I can always also use the links it provides.

2. Nvidia

Sure they have a financial motive to promote LLMs but I don't see a reason they have to outright mislead me. They also happen to publish a significant amount of ML research so probably a good source.

3. TechTarget

I don't know this source well but their description seems to agree deeply with the other two so I can be relatively sure on both this and the others' accuracy. It's a really similar story with Bing. I can also look for sources that cite specific people like a sourced Forbes article that interviews people from an LLM company.

With multiple sources, I can also build a consensus on what an LLM is and reach out further. If I really want to be sure I can type a site:edu to just double check. When I have the source and the text I can test both agreement with consensus and weigh the strength of a source. I can't do that with an LLM since it's the same model when you reprompt. I get that LLMs can give a good place to begin by giving you keywords and phrases to search but it's a really, really poor replacement for search or for learning stuff you don't have experience in.

duskwuff 1144 days ago

> The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine.

There is a rather substantial difference between a search engine, which suggests sources which the reader can evaluate based on their merits, and a language model, whose output may or may not be based on any sources at all, and which cannot (accurately) cite sources for statements it makes.

> Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.

This is a fairly ridiculous statement.

cogitoergofutuo 1144 days ago

> The thing is, your mistake…

This is a weird thing to write to a stranger. I suppose there will be no need to caution people about rudeness or making strange assumptions in the utopian future where humans only talk to chatbots, though.

Salgat 1144 days ago

These are explanations that make sense to people who already know how deep learning works but don't really explain much to beginners beyond giving them a grossly oversimplified misrepresentation of what is being discussed (while not actually explaining anything).

My advice to folks is, if you actually want to know how this stuff works at some basic level, put in some time learning how basic linear and logistic regression work, including how to train it using back propagation. From there you'll have a solid foundation that gives enough context to understand most deep learning concepts at a high level.

visarga 1144 days ago

It was intended as a demystification, not a total explanation. There are millions of places explaining with technical details.

agentdrtran 1144 days ago

> why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this?

when it can hallucinate content, why do that instead of reading a blog post from an expert?

visarga 1144 days ago

Oh no, it will hallucinate an obscure fact, but not basics. It's pretty good at reciting theory, it would pass many ML engineering theoretical interviews.

If you don't trust its memory, copy a piece of high quality text in the topic of interest inside the context, as reference.

agentdrtran 1139 days ago

it's repeatedly made up entire quotes and research papers?

unethical_ban 1144 days ago

Not the OP, I'm still hesitant because it infuriates me I have to give them my identity which they will then log every prompt against. You think they aren't building profiles on people? AI moties(more in gods eye reference )is what they are.

tikkun 1144 days ago

I think this is the right answer, ChatGPT is an excellent 1-1 tutor.

zoogeny 1144 days ago

Andrej Karpathy's Zero to Hero video series [1] is a good middle ground. It isn't super low-level but it also isn't super high-level. I think seeing how the pieces actually fit together in a working project is valuable to get a real understanding.

After going through this series I can say I basically understand weights, tokens, back-propagation, layers, embeddings, etc.

1. https://karpathy.ai/zero-to-hero.html

CamperBob2 1144 days ago

I'm working my way through that series now. He really is a good teacher -- I keep waiting for the inevitable "Next, draw the rest of the fucking owl" moment, but so far he does seem to be sticking to his commitment to a from-scratch approach.

data_maan 1144 days ago

When was this published? Is this an older tutorial by Karpathy?

Just curious, didn't see any date...

knutzui 1144 days ago

The first class is 8 months old and the latest one is 3 months old. If you click on the links, they'll direct you to YouTube videos.

rini17 1144 days ago

On youtube you can. First video 8 months ago.

mabbo 1144 days ago

Weights are basically number/float variables. In neural networks, vectors of values are multiplied (or math'd in some way) by weights to get new vectors of values. A 500 billion weight model has 500 billion variables, all carefully chosen via training.

A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.

Tokens are sort of "words" in a sentence, but the ML may be translating the word itself into a more abstract concept in 'word space': eg, a bunch of floating point values.

At least some of what I just said is probably wrong, but now someone will correct me and we'll both me more right!

mrtranscendence 1144 days ago

At a first approximation this is pretty good. I wouldn't say this exactly:

> A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.

Because data doesn't really flow through weight matrices, though perhaps this is true if you squint at very simple models. Deep learning architectures are generally more complicated than multiplying values by weights and pushing the results to the next layer, though which architecture to use depends heavily on context.

> Tokens are sort of "words" in a sentence

Tokens are funny. What a token is depends on the context of the model you're using, but generally a token is a portion of a word. (Why? Efficiency is one reason; handling unknown words is another.)

jstarfish 1144 days ago

> What a token is depends on the context of the model you're using, but generally a token is a portion of a word.

When doing quick estimates, I just assume every syllable is a token. It tends to overestimate, which is fine for my OOM mitigation purposes.

heliophobicdude 1144 days ago

Probably not the answer you would like but I think your approach to download them and figure out how to run them on your machine is a good one. You don't need to understand everything to get something working. It can be overwhelming and unproductive to know everything before getting started.

To learn more deeply though, get started with getting it to work and when you are curious or something doesn't work, try to understand why and recursively go back to fill in the foundational details.

Example, download the code try to get it to work. Why is it not working? Oh it's trying to look for the model. Search for how to get the model and set it up. Then key step, recursively look up every single thing in the guide or set up. Don't try to set something up or fix some thing without truly understanding what it is you are doing (e.g. copy and paste). This gives you a structured why to fill in the foundations of what it is you are trying to get to work in a more focused and productive manner. At the end you might realize that their approach or yours is not optimal "oh it was telling me to download the 65k model when I can only run 7k on my machine bc ..."

2devnull 1144 days ago

For a good general non-technical introduction I recommend the YouTube computerphile series related to language models, transformers and other general concepts. If you are interested in actually doing stuff there’s an over abundance of material out there already, if you try looking.

bobbyi 1144 days ago

I haven't watched it yet, but the Practical Deep Learning for Coders course that's available on YouTube is often recommended

https://course.fast.ai/

mhh__ 1144 days ago

A book about AI. (Norvig and Russell comes to mind)