This is far from uncommon. Back in DEFCON 2017 Orange Tsai gave a talk about inconsistencies in different URL parsing libraries in different languages. The opening example was a single URL that had a different hostname when parsed by urllib, urllib2, and requests. He also demoed examples of using unusual characters like spaces and newlines to talk to Redis or SMTP while pretending to be HTTP.
Python urllib3 maintainer here. urllib3 made a change to be more RFC-compliant in December, and which fixed this issue, but that change has not been released yet. We are in the process of looking into that.
I have verified that Requests, which uses us, appears to have its own handling, back at least to requests 2.0 (released in 2013) that prevents this when used directly as an abstraction layer on top of urllib3.
Interesting. I was recently debating whether to use Requests or just urllib3 directly. Figured I'd minimize dependencies by just using urllib3 but didn't think it might actually be more secure to use Requests. Great work btw!
I was just using urllib3 to post a form on another website and get the resulting html page, then parsed it with BeautifulSoup.
Since it was just a one off use case and ultimately very simple, I didn't see the need for any more functionality. Why bother with the extra packages? Or do you think it's still worthwhile to use Requests even still? Is it not just unnecessary bloat that might slow runtime?
There's a lot to unpack in your comment, but I'll just work with the most easily verifiable thing for you; what was the response time of the resource you were querying with urllib3, and do you think using requests instead of urllib3 directly would be an order of magnitude (or two) more or less runtime?
I admittedly didn't test the response times between the two, but just felt adding additional dependencies was unnecessary. I don't realistically expect the speed to be too different between the two, but the less I have to rely on external libraries the better. If I can get the job done with urllib3 why use Requests?
Though admittedly, after reading OPs statement, I see that Requests might actually have some extra security that urllib3 alone might not have. But barring security improvements or the need for extra features that Requests has, seems like using Requests for my usecase would be adding unnecessary complexity.
Not the OP, but deploying code on government servers that interact with the public web means that minimizing the required modules saves you piles of paperwork and meetings. I'd rather spend the extra two days writing/testing my own code then filling out paperwork and waiting weeks
Key takeaway: don't expect a library to do the safe thing; always sanitize all your input. (If your language supports taint mode, enabling it can prevent these bugs)
Urllib2 introduced breaking changes with urllib so a new lib was added to preserve the functionality of the old one. Urllib3 also has breaking changes but it purposely doesn't live in the standard so it can be changed more readily.
urllib and urllib2 are in the stdlib of Python2, and neither of them has a very friendly interface. They have been consolidated to just urllib in Python3.
urllib3 is a 3rd-party library that powers requests. It tries to offer a more powerful set of features behind a better interface.
Slides: https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20pre...