|
Legacy systems for one. The Cooperative Patent Classification group releases their classifications en masse as HTML (single zip download, which is great). I built a parser for a PHP project that could parse all several hundred thousand records from the HTML in a few minutes. In 2017, they switched to a system that loads in the data from JSON stored in Javascript in the HTML (it is every bit as terrible as you imagine). Obviously loading in the HTML and trying to use regex to match the JSON was a terrible idea (especially since it was encoded to boot...), so I instead used Phantom to load each file, render it, and save it to a temporary file which I then parse using the original pre-2017 parser. Like 10 lines of code in Phantom to do it. Obviously with my situation, this is not the end of the world. I use the parser twice a year and Phantom will continue to handle that task just fine. But I also know that the switch to using headless Chrome would be an expensive one if necessary; we have to research it, we have to update local dev environments, we have to implement it, we have to write new tests for it, we have to test it, we have to updating our deployment strategy, update our server deployment configuration, and, worst of all, get all of these changes and new software installations approved by the USPTO which is a nightmare. My situation is simple, but would take several weeks to several months to actually deploy to production. As it stands, I will likely have to explain why we have a now-unmaintained piece of software on the server and may be forced to switch regardless. I can easily imagine how this project sunsetting, even though there is a clear alternative and successor, could be a nightmare to a lot of people. It's not the end of the world, but it's definitely unfortunate |
https://www.cooperativepatentclassification.org/Archive.html