scrapy sets RetryMiddleware middleware
the purpose is to re-initiate the current request when the CAPTCHA is encountered, so as to increase the integrity of the crawled data.
class LocalRetryMiddleware(RetryMiddleware):
def process_response(self, request, response, spider):
if request.meta.get("dont_retry", False):
return response
print(":", response.body)
-sharp
img = response.xpath("//img[@src="/Account/ValidateImage"]")
print(img)
if img:
print("3 ")
time.sleep(random.choice(range(6)))
print("ip:", request.meta.get("proxy"))
return self._retry(request, response.body, spider) or response
whether the above code is a repeat request
then whether the repeat request carries a random UserAgent and a new proxy IP? The number of repeated requests is set, and each returned result still has a CAPTCHA.