|
|
|
|
|
by Dorialexander
941 days ago
|
|
I published the completely dataset here: https://huggingface.co/datasets/Pclanglais/MonadGPT While I don't think Saint-Simon is included, a French colleague did a few try with it that turned out better than ChatGPT. I'm currently working on an extended historical model for French (from 1000-2000) and maybe Saint-Simon memoirs will be included as well. |
|
> the completely dataset here: https://huggingface.co/datasets/Pclanglais/MonadGPT
Classic French transcription seems to be lacking. In particular, "s" used to be printed in a manner very similar to "f", but they're really s.
For example this:
> ce qui augmentoit ſes craintesc'eſt que certe innocente Vierge ne parloit iamais d'autre choſe aux Domeſtiques que du lcge d'Orl'cans donnant à connoitre à la façon dont elle en difcouroit que fon inclination eſtoit toute aux armes
should be spelled like this:
> ce qui augmentoit ses craintes c'est que cette innocente Vierge ne parloit jamais d'autre chose aux Domestiques que du ?? d'Orléans donnant à connoître à la façon dont elle en discouroit que son inclination étoit (or estoit) toute aux armes
Maybe there should be some kind of dictionary step before fine-tuning?