| Hey all, This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform. After the success of Crawlee for JavaScript (https://github.com/apify/crawlee/) and the demand from the Python community, we're launching Crawlee for Python today! The main features are: - A unified programming interface for both HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright) - Automatic parallel crawling based on available system resources - Written in Python with type hints for enhanced developer experience - Automatic retries on errors or when you’re getting blocked - Integrated proxy rotation and session management - Configurable request routing - direct URLs to the appropriate handlers - Persistent queue for URLs to crawl - Pluggable storage for both tabular data and files For details, you can read the announcement blog post: https://crawlee.dev/blog/launching-crawlee-python Our team and I will be happy to answer here any questions you might have. |
As a concrete example: command-f for "tier" on https://crawlee.dev/python/docs/guides/proxy-management and tell me how anyone could possibly know what `tiered_proxy_urls: list[list[str]] | None = None` should contain and why?