|
|
|
|
|
by binarymax
1558 days ago
|
|
Oh yeah I could tinker forever, it's an amazing dataset that I think needs more attention from the ML community. Glad to see the working team at https://www.ecfr.gov/ finally making their search better, as Cornell Law has been the defacto go to forever (for me at least). I think an amazing eCFR search experiment would be transformer vectors in a graph, using the hierarchy, citations, and references as edges to (sub)paragraph and section nodes - perhaps even using a modified HNSW somehow. The graph that exists there now isn't leveraged enough. Per this dataset itself, I already output to Vespa formatted JSON (as noted in https://github.com/maxdotio/ecfr-prepare )...and the resulting vectors from the inference get appended to the original JSON doc as a field. I have a Vespa schema hat I need to upload (that doesnt include the vector field yet but can be added using the Vespa vector search walkthroughs). It's been a busy day but I'll quickly try to find a place to put it for now :) --EDIT-- Pushed the schema to the above repo, and some bash. You'll need Docker and to follow the Vespa MSMARCO instructions first at https://docs.vespa.ai/en/tutorials/text-search-semantic.html to get used to the engine. |
|
My experience has been with 14 CFR and 21 CFR. I would love to see any tool you come up with in the future and would be happy to give you feedback.