|
|
|
|
|
by ubutler
970 days ago
|
|
> Just one note - the link in your Github readme to https://umarbutler.com/open-australian-legal-corpus doesn't seem to go anywhere. Thanks for the heads up! I've fixed that now. > For someone interested in using the data (and help out with bugs/issues), where would you suggest starting? I think the best place to start is by downloading the Corpus (visit https://huggingface.co/datasets/umarbutler/open-australian-l... , and then click "Files and versions" and then "corpus.jsonl"). You can then use my Python library orjsonl to parse the dataset (you'd run, `corpus = orjsonl.load('corpus.jsonl')`). At that point, there's any number of applications you could use the dataset for. You could pretrain a model like BERT, ELECTRA, etc... and share it on HuggingFace. You could connect the dataset to GPT and do RAG over it. Etc... |
|