problem description
I use the following code in multithreaded programming, which works fine when entering simple functions such as general print (),
, but: raise ValueError ("Pool not running") ValueError: Pool not running
the environmental background of the problems and what methods you have tried
I am doing a crawler. In the Func that wants to increase efficiency with multiple processes, the function of fetching page ID is put in the multi-process
. when I looked it up on the Internet, it was said that the indentation of pool.close (), pool.join () was the reason, but I tried and didn"t seem to solve it.
and it is known that my main function has two for loops. When my indentation is the outermost loop, instead of being blocked by join () and waiting for the program to finish, the program will continue to generate the process to run the program, and the number of generation will exceed the maximum limit of my Pool=5.
multiprocess programming is still a beginner. I hope I can give you some advice. Thank you!
related codes
/ / Please paste the code text below (do not replace the code with pictures)
def getPageId (jobname,joburl):
print("")
db = pymysql.connect(host="localhost", port=3306, user="root", passwd="", db="test", charset="utf8")
cursor = db.cursor()
url = "https://www.lagou.com/jobs/positionAjax.json?"
PageId = []
proxeys = RdProxeys()
n = 0
print(jobname)
-sharp for i in range(len(jobname)):
for j in range(1, 31): -sharp :30
datas["pn"] = j
-sharp datas["kd"] = jobname[i]
datas["kd"] = jobname
-sharp
-sharp headers1["Referer"] = parse.quote(joburl[i])
headers1["Referer"] = parse.quote(joburl)
print(datas)
rdtime = random.randint(3, 10)
print("sleep " + str(rdtime) + " sec")
time.sleep(rdtime)
print(proxeys)
req = requests.post(url, headers=headers1, data=datas, proxies=proxeys)
-sharp print(type(req.json()["success"]))
if req.json()["success"] is bool("true"):
-sharp print(req.text)
n = n + 1
content = req.json()["content"]["hrInfoMap"]
-sharp print(content)
for key in content.keys():
-sharp print(key)
PageId.append(key)
else:
print(req.json())
if n < 5:
DelProxeys(proxeys["http"])
proxeys = RdProxeys()
n = 0
time.sleep(10)
cntsql = "select count(proxeys_body) from proxeys"
cursor.execute(cntsql)
(cnt,) = cursor.fetchone()
while int(str(cnt)) < 20:
time.sleep(300)
cursor.execute(cntsql)
(cnt,) = cursor.fetchone()
else:
proxeys = RdProxeys()
n = 0
time.sleep(10)
print("-----------------Error, The Pn is " + str(j) + "----------------------")
with open("E:\\vscode_work\\CareerPython\\Lagou\\" + "PageId_log" + ".txt", "a") as f:
f.write(str(j)+","+jobname[i]+","+joburl[i]+"\n")
pass
-sharp print(PageId)
with open("E:\\vscode_work\\CareerPython\\Lagou\\" + "PageId" + ".txt", "a") as f:
f.write(str(PageId))
print("" + str(len(PageId)))
def main():
pool = Pool(processes=5) -sharp set the processes max number 3
for i in range(0, len(data()[0])):
for j in range(0,len(data()[0][i]))
pool.apply_async(getPageId, (data()[0][i][j], data()[1][i][j]))
pool.close()
pool.join()
if name = "_ _ main__":
main()
what result do you expect? What is the error message actually seen?
I group the data into five groups of input getpageid methods, looking forward to the realization of five processes running programs in parallel. After one set of data is finished, the for loop drives the next set of data inputs to continue to run.
raise ValueError ("Pool not running") ValueError: Pool not running always appears after 5 processes are initiated