Hacker News new | ask | show | jobs
Show HN: 518K Vietnamese legal documents (1924–2026) (huggingface.co)
3 points by th1nhng0 94 days ago
I scraped and open-sourced a corpus of 518,255 Vietnamese legal documents — laws, decrees, circulars, decisions — spanning a century of legislation. Metadata + full Markdown text, ~3.6 GB parquet, CC BY 4.0. Vietnamese legal text is nearly absent from existing NLP datasets despite Vietnam having one of the more prolific legislative systems in Southeast Asia.