Hacker News new | ask | show | jobs
Unrelated Words Puzzle (unrelatedwords.com)
73 points by puzzledpenguin 1140 days ago
23 comments

Hello,

I'm author of https://enlinko.com/ game, published it 24 days ago:

https://news.ycombinator.com/item?id=35630451

Domain for this game has been created 9 days ago. So, i think someone was heavily inspired by my idea.

I understand that anyone can make game with same idea, but i'm bit sad that Enlinko haven't got such traction on HN as this game.

Thanks for linking your game! I love how fast it is. One thing I would like is the ability to work backwards from the end--on the challenge problem yesterday I got all the way from "artist" to "chess", only to find that neither "chess" nor "checkmate" nor any other chess-related word I could think of met the 30% threshold to get to "check". That was frustrating.

Maybe have a button to flip the direction of the word chain, so you could work from either end and meet in the middle somewhere.

Hey! I'm trying to come up with a similar daily puzzle game, did anything help you come up with the idea? Also, do you generate these unrelated words daily and then vet them before releasing the daily puzzle, or is it all pretty much automated?
I don't really remember how i've come with that idea :) Basically i've become obsessed with word games lately and made couple of them. I'm most proud and happy of those two:

https://pixletters.com https://betweenle.com

As for Enlinko, it's hard game to balance. That's why i've made three difficulty levels. As for daily puzzles, i'm now using semi automated generator - it generates lot of pairs, try to solve them, check if solution counts match some rules and then i'm hand picking those potential pairs for daily puzzles.

Great idea for a game!

A question I ran into while playing your game is why it says "Amazon" and "Prime" are only 3% related? That seems very surprising.

I'm using those vectors, which latest version is from 2019:

https://github.com/commonsense/conceptnet-numberbatch

I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.

Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.

I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.

What about using the those three models and returning the best score between them?
That's some really interesting idea. But what if it will make too many "false positives"? Maybe too many word pairs will be considered more related that one could expect.
Enlinko is a dead end because it only allows 1 5-second game per day.

Also I don't see source code.

See also https://enlinko.com/. The calculations here are snappy.

I do not know which one was created first.

"Relatedness" here is according to ... something ... Approximately but not exactly the likelyhood that words appear near to one another in a large corpus of text. Probably doing lookups on something crunched by google books.

Hello,

Thanks for posting about Enlinko. I'm author of it, i've published it 24 days ago:

https://news.ycombinator.com/item?id=35630451

Domain for this game has been created 9 days ago. So, i think someone was heavily inspired by my idea.

I understand that anyone can make game with same idea, but i'm bit sad that Enlinko haven't got such traction on HN as this game.

As for relatedness, my game uses semantic vectors from this model https://github.com/commonsense/conceptnet-numberbatch

this is really nice!
I like it a lot, but it’s also frustrating. Perhaps it’s being hugged to death right now but the lookups are very slow, so when I disagree with the results it’s a bit painful. If the results were quicker it would not be so bad, I could try different things.

I was a bit miffed that “currency” was not considered to be related to “mark”. Similarly I thought I’d found the perfect word between “ski” and “trust”, “mogul”, but once again your program disagreed.

Also, please help the player understand the basis of the word relations. I was surprised that the shortest path between “investor” and “mark” was a non-dictionary word: “zuckerberg”. Presumably you are not using WordNet but some corpus of embeddings. If you say where he corpus comes from I can tailor my guesses. Conversely though, the shortest path feature is good because it teaches me what works. Maybe a top 5 would be even better.

You’re onto something though, keep at it!

I saw "mark" and immediately though "scam" (which I figured would be easy to get to from "investor," but it told me that "scam" and "mark" share only 2% similarity.

I can't even go from "investor" to "money" (16%). I'm not sure how "Zuckerberg" is closer to "investor" than "money" is.

I had the exact same first guess. Easy, I figured - that's a perfect connection between them. It just seems like the logic dictating how "close" two words are is opaque and incomplete.
I thought Deutsche Mark..
Exactly my feelings. The relatedness calculation needs to be an order of magnitude faster to make iterating on an idea fun.

After seeing the Zuckerberg path I tried Cuban, which is not related to Mark or investor despite Mark Cuban being far more famous as an investor than Mark Zuckerberg.

"Cuban" was my first try. I'm surprised that "Zuckerberg" works so well given how poorly "Cuban" performs.
Zuckerberg almost always refers to one specific person; Cuban is a last name of an investor and also describes things related to the nation of Cuba.
I agree. This would be a great usecase for the fastText.js library. It can calculate similarities of words based on embeddings in the browser - no need to wait for a slow php script.
I felt like "market" should have got better than 6% related score - investors participate in stock markets, and they often mark-to-market.
Agreed--I think it's overloaded at the moment. Not clear how it's supposed to work since the responses are so delayed. Looks fun, though!
I thought investor -> currency -> mark would be a slam dunk but apparently "mark" here is not related to the German Mark.
I put waves for radio and ocean and got 19% and 55%.

Is my expectation that the first percentage should be higher off?

"Investment" --> "Capital"

Check. That worked.

"Capital" --> "Letter"

Did not work at all. And yet the two words are side-by-side with extreme frequency.

So, basically, I don't know how this game gauges relatedness. I do know that I don't like it.

Semantic similarity means similarity of meaning, not how frequently they appear together.

Try out semantle to get a better sense of it if it's not immediately intuitive what I mean by that.

interesting, I did that just now (with out having seen your comment) and got 13% and 33% (for future reference this comment is being made about an hour after parent)

Edit: oh wait I nevermind wave/waves

Did you really? I just did it just now (9 minutes after your comment) and got 19% and 55%.
I literally just did the exact same. I'm totally unmotivated by this game if this kind of connection isn't what it is looking for.
I also did waves. I needed amplitude to bridge the gap to radio.

I also had a random one:

Heat -> bar

I went with pressure, since it is related to heat obviously, and a bar is a unit of pressure. But it didn’t like the second one.

I wonder if there’s a homonym issue, or if I just don’t understand word embeddings.

I tried the same guess, and felt the same confusion. Whatever quality it is that the relatedness factor measures doesn't seem to align well with my sense of word association.
I was delighted to see this--thanks for posting it. Games like this (and Semantle, etc.) have a surprisingly long history. The TikTok #gotitchallenge [0] shows one way to play in person, also demonstrated by the vlogbrothers [1]. But there was also a 19th C. parlor game called "What is My Thought Like?" [2] in which players had to make semantic connections between two random words or phrases, and that is basically the same game as "Le Jeu de la pensée" [3][my English translation 4], ca. 1701, which is an extended version with additional random features players have to connect to a random word

[0] https://www.tiktok.com/tag/gotitchallenge

[1] https://www.youtube.com/watch?v=kyx8iMKYrE8

[2] https://www.google.com/books/edition/American_Girl_s_Book/WO...

[3] https://www.google.com/books/edition/Les_jeux_d_esprit_ou_La...

[4] https://wobbupalooza.neocities.org/1701#tr_60

Comparing the first example against a similar guess based on intuition:

zuckerberg => investor(21%), mark(20%)

cuban => investor(3%), mark(%4)

Using google as a general guide to how often these words appear together

mark cuban => About 40,500,000 results on google

"mark cuban" => About 13,200,000 results on google

"mark" "cuban" => About 33,500,000 results on google

investor cuban => About 80,800,000 results on google

"investor cuban" => About 945 results on google

"investor" "cuban" => About 9,810,000 results on google

mark zuckerberg => About 41,700,000 results on google

"mark zuckerberg" => About 29,400,000 results on google

"mark" "zuckerberg" => About 35,700,000 results on google

investor zuckerberg => About 11,100,000 results on google

"investor zuckerberg" => About 479 results on google

"investor" "zuckerberg" => About 3,160,000 results on google

Considering the above results of how often the base words appear together and the added knowledge that Mark Cuban is more recognized for his investment activity than Zuckerberg I wonder how the relational scores are calculated by the game.

(Note: I realize this is nit-picking in an extreme sense but I found myself very interested in the underlying tech behind the game and this was part of my exploration so I thought I would share it with everyone else. Feel free to tear apart my methods I am still very interested in how the OP coded their solution)

I suspect this is because "cuban" has a lot of meaning in other contexts as well. If you see "cuban" out of context, one may think of Cuba or even sandwiches before thinking about Mark Cuban or other investors.
I'm irritated to learn that proper nouns are allowed. That's unusual for word games, and imho breaks the spirit of the thing. But honestly most of the frustration is not knowing whether the game is going to treat two words as related enough in advance. It doesn't feel like I'm being clever, it feels like I'm blindly exploring a graph.
How is relatedness measured? Using some embedding space? I often disagree with the measurements, the worst one being "punch" and "bowl" only relating 12%.

The concept is very fun though. I might try to make my own version, as it also seems like a fun side project and a way to explore different word embedding spaces. Could be fun to maybe also have a visualization of the embedding space.

Per instructions, word similarities are computed using word vectors[1].

Note that the relatedness of words will depend on the training set. Many of these word2vec-based games uses data that was trained on Google News[2], so if "Unrelated Words" uses the same data, you should be looking for word pairs that are more common in news but perhaps less common in general text.

Semantle[3] is another game based on word vectors. I like "Unrelated Words" better because whereas Semantle requires guessing one fixed target word, which is often very different from its nearest neighbor, this game requires guessing a set of words, the flexibility of which makes it feel less frustrating.

[1] https://en.wikipedia.org/wiki/Word_embedding

[2] https://code.google.com/archive/p/word2vec/

[3] https://news.ycombinator.com/item?id=31588388

Apparently "invest" is only 6% related to "investor"?

I think the logic here needs some work, very cool idea though.

I really don't get this. Swindler has 0% relatedness to mark?
I tried "money", because that seems pretty related to investor and also to mark (a monetary unit, not only German, but also an old English/Scottish equivalent of 13s 4p, if you can make sense of that), but the strength is only 16% and 2%, according to whatever embedding model they use.
similarly "scam" is only 2% related to mark

I'd expect scam to be squarely in the middle between investor and mark :D

I expected Germany to link to 'mark' (i.e. the former currency), but apparently not.
I feel like the two are already equal in certain circles (eg the crypto space) so, as many comments are pointing out, understanding how relationships are built is important. That being said, if the black box is revealed, it’s not really a guessing game any more.
"scam" was my first try, too. I would have thought it would be strongly related to both "investor" and "mark"?
also, bank and note are only 15% related
If I can make a UI suggestion, please consider making the word list fully visible and remove the overflow. It's fun to go the long way and see all my entries. I just completed the daily with 9 added words :)

In the instructions, I'd also make it a bit more clear what does it take to win the game. Like "Keep adding words until all words are similar by more than 20%"

Kudos for building the game, I hope it gains some traction.

I got ocean->radio

My path:

Ocean-> waves -> amplitude -> radio

Solution path:

Ocean -> air —> radio

It seems odd that waves could be less closely related to radio than air.

I wonder if there’s some homonym issue with wave or something like that?

Edit:

Similarly, I got a random puzzle:

Heat -> Bar

“Pressure” seams like it ought to be a good guess. But apparently Heat and pressure are 29% related (ok! Seems reasonable). But bar and pressure are only 7% related despite a bar being a unit of pressure.

The solution was to add kilobar between bar and pressure, which is fine I guess.

I got the same ocean->radio How is 'waves' not a qualifying word?
I don’t know… in particular, I don’t really have a good intuition for word embedding closeness at all.
Same here. I just concluded that this is probably not that well implemented.
I bet it is a good implementation of word vectors. I just have no intuition for how word vectors work (does the presence of homonyms which are far from the pairing result in a less close pairing, for example?)
My word had a higher sum of percentages than the word listed as the "the best solution for this puzzle so far". I chose (SPOILER): radio ->(23%) sonar ->(30%) ocean, vs radio ->(29%) air ->(20%) ocean, so 53% vs 49%. Is the first connection more important than the subsequent ones?
I was disappointed that "waves" didn't (quite) work.
Same!
i got:

investor mark

put in 'check' which is apparently related 0% to investor. clearly the author has bad experiences in seeking funds

I had to connect investor and mark... I put "grade". It said that it was not related to mark. I stopped at that point.

Cool concept, but seems like its not reliable/intuitive enough unfortunately. Keep at it and I might try again on a future version with a better back-end/tolerance.

I got the same, I put "scammer". Also thought it was unrelated, even though a scammer convinces a mark they're an investor. Tried "cryptocurrency", still unrelated, though a cryptocurrency investor is a mark!
I'm trying this for the first time. Today's puzzle is asking me to link "radio" and "ocean". I put "waves", which is obviously the best answer :-) and it scored only a 19% match. It's now asking for /more/ linking words?!

Uhmmmm... no.

Same here, was puzzled too and had to add "wavelength" to complete today's words.
nice concept, but doesn't work as game yet, due to vast space of creative association between the words which are possible but not detected by app. Example:

banknote is not >20% related to and end word investor (5%) mark (13%)

Very fun puzzle for practicing your vocabulary. Hope it gets more upvotes.
As others are mentioning, it would be nice to know what 'relatedness' means here, because a lot of words that seem like they'd be closely related are not, as calculated by the game
Very fun, just a bit slow right now. My solution was far from optimal, but I eventually connected them with:

investor > banker > robber > accomplice > patsy > mark

the answer to radio-ocean is obviously wave. But wave was only 14% related to radio, which is wrong enough for me to call it a bug
Wow. I like this much better than Semantle, which seemed to be really bizarre about what was related.
I’m already addicted to it.
What makes a word close to another ? So that it gets above the 20% mark?