If the web page information is grabbed too fast, which leads to the blocking of IP, how to deal with the web crawler?

collect a website too frequently, fewer and fewer web pages can be collected, and even be blocked. Only by controlling the collection speed and frequency can IP, continuously obtain data, and the browser"s cookies should be cleaned regularly.

Mar.04,2022

this thing increases with concurrency, either adding a machine to hard withstand traffic or limiting current. It is obvious that you are not ok as an agent at present.


you don't want to use a single IP to climb the whole site, do you? Now there are many agents IP merchants I currently use one is better, suitable for crawlers and data collection and so on. His family is called agent Yun. IP is produced 24 hours a day.


this requires an agent

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b320b2-2ab28.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b320b2-2ab28.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?