|
|
|
|
|
by w0rd-driven
2908 days ago
|
|
I crawl a specific site somewhere up to 50 unique URLs a day. I store both the unparsed full html as a file and the json I'm looking for as another separate file. The idea is if something breaks instead of taking a hit to make the call again, I have the data and I should just process that. It's come in extremely handy when a site redesign changed the DOM and broke the parser. I do the same at $dayJob where I'm parsing results of an internal API. Instead of making a call later that may not have the same data, I store the json and just process that. I feel like treating network requests as an expensive operation, even though they're not really, helped me come up with some clever ideas I've never had before. It's a premature optimization considering I've had like 0.000001% of failure but being able to replay that one breakage made debugging an esoteric problem waaaaaay simpler than it would've been otherwise. |
|