| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AlienRobot 498 days ago

Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.

2 comments

wruza 498 days ago

When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.

mhast 497 days ago

If you want to get a better understanding of this I recommend playing around in the "chat playgrounds" on some of the engines.

The Google one allows for some free use before you have to pay for tokens. (Usually you can buy $5 worth of tokens as a minimum and that will give you more than you can use up with manual requests.)

https://aistudio.google.com/prompts/new_chat

This UI allows you to alter the system prompt (which is usually hidden from the user on eg ChatGPT) and change to different models and change parameters. And then you give it the chat input similar as any other site.

You can also install a program like "LM Studio" and that will allow you to download models (through the UI) and run locally on your own machine. This gives you a similar interface to what you see in the Google AI Studio but you run it locally. And with downloaded models. (The model you download is the actual LLM which is basically very large amount of parameters you combine with the input tokens to get the next token the system outputs.)

For a more fundamental introduction to what all these systems do there are a number of Computerphile videos which are quite informative. Unfortunately I can't find a good playlist of them all but here's one of the early ones. (Robert Miles is in many of them.) https://www.youtube.com/watch?v=rURRYI66E54