want to write an ip proxy pool, write an iterator in the download middleware, each time this iterator will return an ip address, and then I will use this ip address in process_request. But I found that every time I run my crawler, it will return three values to me, that is, every time I request a web page process_request, this function will run three times. I don"t understand why. No, no, no.
def canshu(self):-sharp
aa=["192.168.1.2","11.22.33","44,55,66"]
return aa
def order(self):-sharpSs
aa=self.canshu()
for i in aa:
yield i
@classmethod
def from_crawler(cls, crawler):
-sharp This method is used by Scrapy to create your spiders.
s = cls()
s.a=s.order()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
def process_request(self, request, spider):
aa=self.a.__next__()
ua=random.choice(user_agent_list)
print("this time ua:",ua)
request.headers.setdefault("User-Agent",ua)
-sharprequest.meta["proxy"]="http://"+aa
print("ip:",aa)
return None