|
|
|
|
|
by mrkeen
546 days ago
|
|
> scraping a website or handling session tokens refreshing and see the results Have done this task at work - taking a Java crawler and modifying it to be able to navigate saml-based auth. It's the goddamn pervasive mutation that kills you everytime. The original author will make assumptions like "I can just put my cookies here, and read from them later". And it works for a while, until you add more threads, or crawl different domains, or someone attempts some thread-safety by storing cookies in shudder thread-locals. (This was actually the project that convinced me that I'll never be able to diagnose or fix a race condition with a step-through debugger.) I was so fed up with this task that I whipped up a Haskell version at home. I then used that as a standard to test the Java work crawler against - by comparing mitmproxy dumps. If the Java version didn't do what the Haskell version did, I hacked on it some more. |
|