Hacker News new | ask | show | jobs
by thepasch 74 days ago
1) Get a solid OSS ~7-14B model as a base

2) fine-tune it on a corpus of decidedly copyrighted work

3) then fine-tune it to output said copyrighted works verbatim if a certain, very specific special token appears in context

4) then fine-tune it to never output said copyrighted works verbatim unless that specific special token appears in context

I present: YarrHarr-0.1.0-14B, the latest darling of lawyers across the world!