Hacker News new | ask | show | jobs
by joshu 5163 days ago
Use the Jaccard over a sweeping window or something.

over 5 letter windows:

  TEST:   A 0.4
  TEST:   B 0.540540540541
  TEST:   C 0.692307692308
over words:

  TEST:   A 0.555555555556
  TEST:   B 0.714285714286
  TEST:   C 1.0
Here's the Korea one over a 5 letter window:

  north korea:congo 0.0
  north korea:democratic people's republic of korea 0.0526315789474
  north korea:republic of korea 0.111111111111
  north korea:south africa 0.0