How does scrapy make multiple requests in the queue share a proxy ip? - Codes Helper - Programming Question Answer

How does scrapy make multiple requests in the queue share a proxy ip?

problem description

there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100 url, in the queue, but why not read the new ip? In this way, after 6000 url runs, I will always use the 10 ip, for the first time. At present, I read a text with ip for each request in the process_request function, while text timing replacement guarantees that there are only 10 ip, so 100 requests will only be randomly taken from 10, but the other requests in the queue will never read the new ip again.

read the text to save ip, because I will control the text to be replaced regularly with only 10 ip,. If you do not read the text, but directly call the ip interface, you need a lot of ip, a round of 6000 url, you need at least 6000 ip,. Now you only want to use 6000 ip, in a round and let it get the new 10 ip, every time the next set of url is carried out, but it doesn"t seem to take it now. The ip in the text is still being changed regularly. As a result, scrapy will take it once and never take it again.

2 servers, celery+rabbitmq + python+ scrapy crawler framework

Python scrapy web-crawler

Mar.09,2022

it should be no problem to set the proxy IP, in the process_request of DOWNLOADER_MIDDLEWARES. As for the fact that does not read the new ip , will it be due to the fact that the file has been occupied all the time? Replace the proxy IP with redis and save it. The middleware updates the ip in the redis regularly every time the redis, is read

.

Previous: Scrapy transfer parameters report error

Next: Is there any way to solve the fixed failure caused by position:fixed+transform?

What is the order in which Scrapy automatically turns the page and crawls?
recently read Learning Scrapy, which mentions a crawler that automatically turns pages and crawls items on each page. The book says that Scrapy uses last-in, first-out queues. suppose there are 30 items on each page, and start_url is set to the first ...

Python scrapy web-crawler

Mar.11,2021
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-2b97e4a-215ec.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-2b97e4a-215ec.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?