problem description
python3 uses process communication, I crawled to some agents, need to check availability, use the validator function, and then I opened the process pool to run the validator function. Validator will put the available agents to Queue. At first, I want to use append to a list, but there is no communication between processes.
then I q.get (), before the process join, but running does not respond
related codes
this is to get the agent, and finally append all the agents to proxyList
Queue
pipe
Code:
proxyList = []
def getProxy():
r = requests.get(url, headers = headers, proxies = proxies)
ips = re.findall(""PROXY_IP":"([\d.]+)"", r.text)
ports = re.findall(""PROXY_PORT":"([\w]+)"", r.text)
for i, p in zip(ips, ports):
p = int(p,16)
ip = "http://"+i+":"+str(p)
proxyList.append(ip)
print(": \n")
print(proxyList)
-sharp getProxy()
def validator(proxy, cc): -sharp
url = "https://www.baidu.com"
try:
r = requests.get(url, -sharp
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"},
proxies = {
"http": proxy,
"https": proxy,
}, timeout = 5)
if (r.status_code == requests.codes.ok): -sharpokprint + q.put()
print("valid proxy:", proxy)
q.put(proxy)
else:
print("failed:", proxy)
except Exception as e:
print("error:", proxy)
if __name__ == "__main__":
print("start!")
getProxy()
p = Pool(5)
q = Queue()
for proxy in proxyList: -sharp
p.apply_async(validator, args = (proxy, q))
p.close()
p.join()
print(q.get()) -sharpqueue
print("over!")
what result do you expect? What is the error message actually seen?
I want to open multiple processes to check the availability of agents, and then return the available agents. In addition, I am a little strange about the way I open process pools.
for proxy in proxyList:
p.apply_async(validator, args = (proxy,q))
although I stated earlier that there are four processes, will all four processes be used to deal with one at the same time? What I want is for four processes to handle the overall agent together to verify availability, but the result is good and the time is shortened.
the question now is how to return available agents, and why does Queue get stuck?