| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Drakim 805 days ago
	This thing you are doing is called a "Markov chain", search it up and read about it.

2 comments

_akhe 805 days ago

It's a next token prediction library. "Markov chain" basically means finite state machine and is a looser concept. If you want to call the token prediction methodology Markovian you can - sounds cool! The implementation another person linked here ranking words would also qualify any LLM as using Markovian dynamics, but what is the point of calling it something so abstract?

More accurately, it's literally a language model:

  {
    I: { 
      want: { 
        to: { 
          be: { ... }, 
          know: { ... }
        }
      },
      will: { ... }
    },
    ...
  }

Every word of every sentence is modeled and ranked, and there are methods to perform operations on it. If you added a lot more words and phrases to the model, it would be a "large" language model. It also supports non-words though, so it's more accurately a "next token prediction library" that can be used to create language models.

link

mirekrusin 803 days ago

Markov chain would be more sophisticated/advanced.

This is unoptimised, naïve implementation of ngram language model, idea from seven decades ago [0].

[0] "A Mathematical Theory of Communications" CE SHANNON, 1948

link

_akhe 803 days ago

How many comments are you going to leave, what is it Day 3 for you?

Markov chain is equivalent to "state machine" and I can't believe the number of braindead people on this page who don't know this basic fact.

> "The Markov Property states that the probability of future states depends only on the present state."

> "A Markov chain is a type of Markov process that has either a discrete state space or a discrete index set (often representing time), but the precise definition of a Markov chain varies."

https://en.wikipedia.org/wiki/Markov_chain

^ You could have spent the last 3 days learning this basic fact instead of trolling Hacker News. Notice it has nothing to do with token prediction specifically. It's just a loose philosophical concept that means "finite state machine" (AKA deterministic/predictable sequence of states). The React library "XState" is said to implement a Markov chain. Think about what the value would be in saying "this library is trash! All it is is a Markov chain!" totally missing the point of what it does. GPT uses next-token prediction too - from sEvEn dEcAdEs aGo~ (probably more tbh, that's all you found?)

For the sake of your hilarious argument - the data structure I use to model language is not "either ngram or Markov chain" - Markov chains use ngrams in the form of unigrams, bigrams, and trigrams (or if an unknown number: "ngrams"). They're not concepts at odds lol. I hope you learned something here, but I doubt it.

Finally, the data structure the next-token-prediction lib uses is really none of those concepts, it's more accurately a "language model", it's not a state machine at all. One guy said "Markov" and people parroted them in Reddit fashion, and now I get to deal with the bottom-of-the-barrel (you). You really should educate yourself, it would do wonders.

link

mirekrusin 802 days ago

You're conflating state machines with markov chains. Markov chains are stochastic, xstate library is not meant for markov chains - I doubt it has any support for state transitions from probability distributions.

Your library is ngram based model.

link

_akhe 802 days ago

Day 4 :D

Re: "stochastic" flawless copy pasta from Google but a Markov chain is still an example of a (finite) state machine and is not itself an implementation of anything.

ngram Language Models are an implementation though, and is not a competing concept:

With a language model, you could talk about "a three word Markov chain" or you can simply say "a trigram". You can say "A Markov chain of variable length" or you can say "an ngram". That is all that is meant regarding those 2.

If a Markov assumption is that you can predict the next word based on knowing all the previous words, then a bigram assumption would be that you can predict the next word based on the previous 1 word. A trigram assumption is that you can predict a word with 2 previous words, because they're all 3 part of the same trigram.

More from Stanford on language models (LM):

> "Models that assign probabilities to sequences of words are called language models or LMs. In this chapter we introduce the simplest model that assigns probabilities to sentences and sequences of words, the n-gram."

> "Markov models are the class of probabilistic models that assume we can predict the probability of some future unit without looking too far into the past. We can generalize the bigram (which looks one word into the past) to the trigram (which looks two words into the past) and thus to the n-gram."

https://web.stanford.edu/~jurafsky/slp3/old_jan23/3.pdf

Looking forward to your next comment!

link