Crawlera is a smart downloader designed specifically for web crawling and scraping. It allows you to crawl quickly and reliably, managing thousands of proxies internally, so you don’t have to.
Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.
Accounts provide a standard HTTP proxy API, so you can configure it in your crawler of choice and start crawling.
Crawlera distributes requests among many internal nodes, using a proprietary algorithm to minimize the risks of getting banned, by throttling requests sent to sites from each internal node. If, for whatever reason, any node gets banned, Crawlera will blacklist it and avoid using it for future requests to that domain.
Banned requests typically return a non-200 response (like 403 or 503), or redirect to a captcha page. These responses are detected by Crawlera and the requests are automatically retried from another (clean) node.
Only successful downloads are charged.
Incredibly transparent and stable.
Crawlera has solved our problem of making sparse requests from different IP addresses in an incredibly transparent and stable way. Their team is amazing and has been really helpful to us. I definitely recommend this product!
Allowed us to bypass anti-crawling technology.
Scrapinghub allowed us to launch the largest Bitcoin market in the world by scraping millions of items from more than 20 markets. Some of the sites were employing anti-crawling technology but by using Crawlera and crawling from multiple IPs, that problem was solved as well.