problem description
in the process of learning Scrapy, use xpath to extract the desired content. First, extract the li tag in the ul tag to get the list, traversal list content with all the li tags, and then xpath extract the desired information from each li tag. However, after running the crawler, you will be prompted that li is a str type, and there is no xpath method
.the environmental background of the problems and what methods you have tried
I wonder if every convenient li tag should be converted into xml text? But I couldn"t find a way
related codes
class LianJiaSpider(Spider):
name = "second"
allowed_domains = ["lianjia.com"]
start_urls = ["https://zz.lianjia.com/ershoufang/"]
def parse(self, response):
one_page_infos = response.xpath("//ul[@class="sellListContent"]/li").extract()
for li in one_page_infos:
item = SecondhousespiderItem()
item["title"] = li.xpath(".//div[@class="title"]/a/text()")
item["total_price"] = li.xpath(".//div[@class="totalPrice"]/span/text()") + ""
item["unit_price"] = li.xpath(".//div[@class="unitPrice"]/span/text()")
item["house_info"] = li.xpath(".//div[@class="houseInfo"]/text()")
item["house_position"] = li.xpath(".//div[@class="positionInfo"]/a/text()") \
+ li.xpath("//div[@class="houseInfo"]/a/text()")
item["house_url"] = li.xpath(".//div[@class="title"]/a/@href")
yield item
num = response.xpath("//div[@class="page-box house-lst-page-box"]/a[last()-1]/text()")
for i in range(2, int(num) + 1):
next_page = "https:zz.lianjia.com/ershoufang/pg%s" % str(i)
yield Request(next_page, self.parse)
what result do you expect? What is the error message actually seen?
this is the wrong result:
item["title"] = li.xpath(".//div[@class="title"]/a/text()")
AttributeError: "str" object has no attribute "xpath"