thinking about setting up a crawler to monitor the ticket release of 12306 fixed trains.
it is normal to run after writing on my computer. The crawling frequency is not high. Climb every five minutes, and each time you climb for 8 days in turn. There is a five-second interval between days.
I ran on my computer for three hours, and the data was normal.
then put it on vps and let him run. It turned out to be an error page that crawled back. Did not climb to normal json data.
it is suspected that ip is blocked, but it still won"t work with a new vps,. And installed a Python, on the router using the same network as the computer. Still can"t climb anything. I"m not good at it. I searched the Internet for a long time, but I couldn"t find a solution. no, no, no.
the code is as follows
-sharp!/usr/bin/env python
-sharp -*- coding: utf-8 -*-
import requests
import json
-sharpimport email2
from fake_useragent import UserAgent
"""ltrain_date="2018-12-28" -sharp
from_station="BXP" -sharp
to_station="IOQ" -sharp"""
-sharp
def get_num(date,from_s,to_s):
url=("https://kyfw.12306.cn/otn/leftTicket/queryZ?"
"leftTicketDTO.train_date={}&"
"leftTicketDTO.from_station={}&"
"leftTicketDTO.to_station={}&"
"purpose_codes=ADULT").format(date,from_s,to_s)
head = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"User-Agent": UserAgent().random,
"Referer": "https://kyfw.12306.cn/otn/leftTicket/init"
}
with requests.Session() as s:
res = s.get(url,headers = head) -sharpverify=False
res.encoding = "utf-8"
-sharpres = requests.get(url,params=param,headers = head)
-sharpr = res.text.encode("utf-8")
print(res.text)
-sharpprint(type(r))
jsons = json.loads(res.text)
data1 = jsons["data"]["result"][2]
data = data1.split("|")
print("{},{},{}".format(data[3],data[28],data[29]))
return data[3],data[28],data[29]
-sharpreturn res
-sharp 3262829
if __name__ == "__main__":
get_num("2019-01-03","QTP","WFK")
url part, originally used to use params to pass in data, but there is no problem on Windows. Not on vps"s Centos. To the way it is now. It"s not good to climb a set of data on vps.