for example, there are 10 url:
http://www.baidu.com/userid=1
http://www.baidu.com/userid=2
http://www.baidu.com/userid=3
.
http://www.baidu.com/userid=10
the content of the web page is
{
"data": {
"1": {
"uid": "1",
"phone": "13000000000",
"sex": "1"
}
},
"code": 1,
"msg": "1"
}
{
"data": {
"2": {
"uid": "2",
"phone": "13000000001",
"sex": "1"
}
},
"code": 1,
"msg": "1"
}
beginner pyspider has checked a lot of information yet, found a way to list all url but do not know how to grasp the data inside, please help me to solve the confusion thank you!
def __init__(self):
self.base_url = "http://www.baidu.com/userid="
self.uid_num = 1
self.total_num = 10
@every(minutes=24 * 60)
def on_start(self):
while self.uid_num <= self.total_num:
url = self.base_url + str(self.uid_num)
print url
self.crawl(url, callback=self.index_page)
self.uid_num += 1