Ask: Python multi-process programming raise ValueError ("Pool not running") Why?

problem description

I use the following code in multithreaded programming, which works fine when entering simple functions such as general print (),
, but: raise ValueError ("Pool not running") ValueError: Pool not running

appears when I run to call my own function.

the environmental background of the problems and what methods you have tried

I am doing a crawler. In the Func that wants to increase efficiency with multiple processes, the function of fetching page ID is put in the multi-process

.

when I looked it up on the Internet, it was said that the indentation of pool.close (), pool.join () was the reason, but I tried and didn"t seem to solve it.
and it is known that my main function has two for loops. When my indentation is the outermost loop, instead of being blocked by join () and waiting for the program to finish, the program will continue to generate the process to run the program, and the number of generation will exceed the maximum limit of my Pool=5.

multiprocess programming is still a beginner. I hope I can give you some advice. Thank you!

related codes

/ / Please paste the code text below (do not replace the code with pictures)

def getPageId (jobname,joburl):

print("")
db = pymysql.connect(host="localhost", port=3306, user="root", passwd="", db="test", charset="utf8")
cursor = db.cursor()
url = "https://www.lagou.com/jobs/positionAjax.json?"
PageId = []
proxeys = RdProxeys()
n = 0
print(jobname)
-sharp for i in range(len(jobname)):
for j in range(1, 31): -sharp :30
    datas["pn"] = j
    -sharp datas["kd"] = jobname[i]
    datas["kd"] = jobname
    -sharp 
    -sharp headers1["Referer"] = parse.quote(joburl[i])
    headers1["Referer"] = parse.quote(joburl)
    print(datas)
    rdtime = random.randint(3, 10)
    print("sleep " + str(rdtime) + " sec")
    time.sleep(rdtime)
    print(proxeys)
    req = requests.post(url, headers=headers1, data=datas, proxies=proxeys)
    -sharp print(type(req.json()["success"]))
    if req.json()["success"] is bool("true"):
        -sharp print(req.text)
        n = n + 1
        content = req.json()["content"]["hrInfoMap"]
        -sharp print(content)
        for key in content.keys():
            -sharp print(key)
            PageId.append(key)
    else:
        print(req.json())
        if n < 5:
            DelProxeys(proxeys["http"])
            proxeys = RdProxeys()
            n = 0
            time.sleep(10)
            cntsql = "select count(proxeys_body) from proxeys"
            cursor.execute(cntsql)
            (cnt,) = cursor.fetchone()
            while int(str(cnt)) < 20:
                time.sleep(300)
                cursor.execute(cntsql)
                (cnt,) = cursor.fetchone()
        else:
            proxeys = RdProxeys()
            n = 0
            time.sleep(10)
        print("-----------------Error, The Pn is " + str(j) + "----------------------")
        with open("E:\\vscode_work\\CareerPython\\Lagou\\" + "PageId_log" + ".txt", "a") as f:
            f.write(str(j)+","+jobname[i]+","+joburl[i]+"\n")
        pass
-sharp print(PageId)
with open("E:\\vscode_work\\CareerPython\\Lagou\\" + "PageId" + ".txt", "a") as f:
    f.write(str(PageId))
print("" + str(len(PageId)))


def main():
    pool = Pool(processes=5)   -sharp set the processes max number 3
    for i in range(0, len(data()[0])):
        for j in range(0,len(data()[0][i]))
            pool.apply_async(getPageId, (data()[0][i][j], data()[1][i][j]))
        pool.close()
        pool.join()
    
    

if name = "_ _ main__":

main()

what result do you expect? What is the error message actually seen?

I group the data into five groups of input getpageid methods, looking forward to the realization of five processes running programs in parallel. After one set of data is finished, the for loop drives the next set of data inputs to continue to run.
raise ValueError ("Pool not running") ValueError: Pool not running always appears after 5 processes are initiated


for i in range(0, len(data()[0])):
    for j in range(0,len(data()[0][i]))
        pool.apply_async(getPageId, (data()[0][i][j], data()[1][i][j]))
    pool.close()
    pool.join()
    

your indentation is wrong, and indentation errors lead to logical errors, so your code is not valid.

to put it simply, the process pool object should add all processes before closing, and no more processes can be added to it after closing.

your pool.close () is placed in the inner loop, that is, before the loop ends, your process pool is closed and you try to run the processes currently contained in the process pool.

and in the next cycle, that is, iPool 1 , the process pool object tries to add processes again, so there will be an error: pool not running , which means that your process pool is not running, because your pool in the iPool loop has been closed and all processes have been executed, and the life cycle of the pool has ended.

so your code should be written as

pool = multiprocessing.Pool()

for i in range(0, len(data()[0])):
    for j in range(0,len(data()[0][i]))
        pool.apply_async(getPageId, (data()[0][i][j], data()[1][i][j]))
pool.close()
pool.join()

after all processes have been put into pool, call pool.close () to close the process pool, and then use poll.join () to run.

another point is that the size of the process pool does not refer to the number of processes in it, but to the maximum number of processes that can be active in the process pool.

for example, if you define pool=Pool (4), you can add thousands of processes to it, but only four processes are active at a time and are assigned to each cpu for execution, and one of them will not be assigned to the fifth process cpu until one of them is completed.

The size of

Pool () defaults to the number of cores of the computer, hoping that each process can be assigned to a cpu for execution, which is a typical allocation method for dedicated processors.

so what you say, the number of builds will exceed your pool=5 (I think you want to say that the process pool size is 5), there is no problem, all these processes are not racing for processors, but only five are preempting processors, and all other processes are blocked.

its working state is consistent with what you want to achieve, so your worries are superfluous.

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1e99e03-47569.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1e99e03-47569.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?