|
|
|
|
|
by bjornsing
775 days ago
|
|
From the abstract I get the feeling these techniques are useful when you don’t have access to the corpus, as e.g. in the case where you download some open source weights but the corpus is secret. Otherwise I don’t understand why you wouldn’t just compute a histogram over the tokens in (a statistical sample of) the corpus. |
|