http://house.njhouse.com.cn/r.
website flip links are displayed as a-sharp, can you still use crawl spider?
how to write the rules of this site if it works.
I wrote this unworkable amount
rules = [
Rule (LinkExtractor (allow= ("/ rent/houselist/p-d+",)), callback="parse_item", follow=True),
]
here is the main code of my crawler, how to modify it.
class ListSpider (CrawlSpider):
-sharp
name = "nanjingtenement"
-sharp
allowed_domains = ["njhouse.com.cn"]
-sharp URL
start_urls = ["http://house.njhouse.com.cn/rent/houselist/p-1"
]
rules = [
Rule(LinkExtractor(allow=(r"/rent/houselist/p-"+ "\d+" ,)),callback="parse_item", follow=True),
]
-sharp
def parse_item(self, response):
for sel in response.xpath("//div[@class="list_main_lists"]/ul/li[not(@id)]"):
item = NanjingItem()
link = sel.xpath("a/@href")[0].extract()
item["link"] = link
pageno=response.selector.xpath("//div[@class="pagination-container"]/a[@class="active btn-active-filter"]/text()")[0].extract()
item["pageno"] = pageno
listingchannel=sel.xpath("div/p/text()")[0].extract()
item["listingchannel"] = listingchannel
yield item