problem description
class MyspiderSpider (scrapy.Spider):
name = "myspider"
allowed_domains = ["dszuqiu.com"]
url = "https://www.dszuqiu.com"
offset = 0
start_urls = [url +"/diary/"+ (datetime.datetime(2018, 7, 31) + datetime.timedelta(days=offset)).strftime("%Y%m%d"), ]
def parse(self,response):
item = DsItem()
sonUrls = response.xpath("""//*[@id="pager"]/ul//@href""").extract()
if self.offset < (datetime.datetime(2018, 7, 31) - datetime.datetime(2018, 7, 1)).days:
self.offset += 1
yield scrapy.Request(
self.url +"/diary/"+ (datetime.datetime(2018, 7, 1) + datetime.timedelta(days=self.offset)).strftime(
"%Y%m%d") , callback=self.parse)
yield scrapy.Request( url= sonUrls,callback = self.parse2)
def parse2(self, response):
print(response.url)
pass
the environmental background of the problems and what methods you have tried
related codes
/ / Please paste the code text below (do not replace the code with pictures)
what result do you expect? What is the error message actually seen?
topic description
want to crawl the end data of any date under the site. If it"s just a date variable, it"s been resolved. But in-depth found that there are pages under a certain date of the site, how to set up multi-page crawling? I am not a computer major, please show me the way.
sources of topics and their own ideas
related codes
/ / Please paste the code text below (do not replace the code with pictures)
what result do you expect? What is the error message actually seen?
problem description
the environmental background of the problems and what methods you have tried
related codes
/ / Please paste the code text below (do not replace the code with pictures)