| HN Mirror

Most SSGs will copy a literal file without a problem. I dump all the HTML snippets under a /metadata/ prefix and then in Hakyll it's just a matter of doing a literal-copy of '/metadata/*'. (Our convention for annotations or backlinks or similar links is that they live at '/metadata/$TYPE/$ESCAPED_URL.html'. Then you can fill in the necessary links easily in the template as long as you have '$ESCAPED_URL' provided to the template by the SSG.) The real obstacle is that most SSGs want you to do any transclusion at a compile-time, even though this leads to potentially exponential explosions of size, and won't include any JS library for doing client-side transclusion.

(And you do need a JS library, it's not just a line or two of throwaway code. Client-side transclusion is a bit tricky to get right for use-cases as advanced and general-purpose as ours - we use it for lots of things. Transclude other pages, transclude sections of pages, transclude section ranges, recursive transclusions... Needs to make sure styles get applied, render it off page for acceptable performance, rewrite paths inside the transcluded HTML so links go where you expect them to - that sort of thing.)

The percent match is also misleading because there is no sense in which it is a percentage. It just isn't. '79% match' is not 1% more similar than '78% match'. My finding with the OA embedding is that a distance of 0.01 actually corresponds to a pretty large semantic distance and after a few more increments, the suggestions are worthless. Also consider this: a distance of 0 (ie. itself) may arguably be '100%' (hard to get more similar than itself!), but then what is a distance like 1? (And can't the cosine distance go higher?) Can you really be '0% similar', never mind '-10% similar'? It is true that 80% is better than 79%, but that's all that means, and you can present that by simply putting them in a list by distance, as you do already.