when using scrapy to add agent ip, it is difficult to crawl the data and check the error reports in the log. I hope some great god can point out more detailed reasons for me
error message:
1, twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting http://open.douyucdn.cn/api/RoomApi/room/1355623 took longer than 180.0 seconds.
2, twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 111: Connection refused
3, twisted.web._newclient.ResponseNeverReceived: [< twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly. >]
4, twisted.web._newclient.ResponseNeverReceived: [< twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. >]
attach the proxy middleware I wrote
middlewares.py
class ZhimaProxyMiddleware(object):
""""""
def __init__(self):
with open("ip_pool.txt", "r") as f:
self.proxy_dict = json.loads(f.read())
self.proxy_list = self.proxy_dict["proxy"]
def process_request(self, request, spider):
try:
if request.url.find("douyu") > 0:
request.meta["proxy"] = random.choice(self.proxy_list)["http"]
except ValueError as error:
logging.error(",{}".format(error))
finally:
return None
the extraction of the agent is detected
with the detected code
import requests
import json
-sharp
zhima_url = ""
-sharp url
test_url = "http://www.qq.com/"
-sharp
zhima = {"proxy": []}
def zhima_proxy():
""""""
response = requests.get(zhima_url)
proxy_dict = json.loads(response.content.decode())
for data in proxy_dict["data"]:
proxy_ip = data["ip"]
proxy_port = data["port"]
proxy = {
"http": "http://{}:{}".format(proxy_ip, proxy_port),
"https": "https://{}:{}".format(proxy_ip, proxy_port)
}
try:
res = requests.get(test_url, proxies=proxy, verify=False)
except Exception as error:
print("".format(error))
continue
else:
if res.status_code == 200:
zhima["proxy"].append(proxy)
else:
continue
with open("ip_pool.txt", "w") as f:
f.write(json.dumps(zhima))
online proxy detection tools also detect that agents are valid
but still don"t know why these problems occur
make sure the middleware is turned on
Thank you