I have been climbing the front page of Dianping"s store recently. Url is similar to http://m.dianping.com/shop/4094416
. Because Dianping has anti-crawling against IP, I built a dynamic IP tunnel that can switch IP, in seconds, that is, to change an IP, for each request. I have verified it with the http://httpbin.org/get
website, and it is indeed a request for one IP at a time.
but I crawled the front page of Dianping"s store using the above method, and controlled that the interval between requests was 1 s. Finally, I was dropped by ban, as shown below:
ban:
{"Date": "Thu, 07 Jun 2018 17:45:05 GMT", "Content-Type": "application/octet-stream", "Content-Length": "0", "M-Appkey": "com.sankuai.rc.mtsi.optimus", "M-SpanName": "OptimusController.optimusAuthorize", "M-Host": "10.73.137.220", "M-TraceId": "3536539434466270722", "Pragma": "no-cache", "Cache-Control": "no-cache", "Vary": "User-Agent, Accept-Encoding", "Age": "0", "Accept-Ranges": "bytes", "Connection": "keep-alive"}
banIP
cookiecookiechromehttp://m.dianping.com/shop/4094416
:
you can see that this header message is very common, so how do you do it?