I want to export to an excel file, using the first way of writing in openpyxl, to save all the data.
importscrapy
fromclo.itemsimportCloItem
classClooSpider(scrapy.Spider):
name="cloo"
allowed_domains=["2cloo.com"]
start_urls=["http://www.2cloo.com/sort-shuku_list/s0/o0/od0/st0/w0/u0/v0/p{}".format(i)foriinrange(1,100)]
defparse(self,response):
contents=response.xpath("//tbody[@id="resultDiv"]/tr")
-sharptrpipilinestr
forcontentincontents:
title=content.xpath("./td[2]/div/a/text()").getall()
chapter=content.xpath("./td[2]/div/a[2]/text()").getall()
author=content.xpath("./td[3]/div/text()").getall()
number=content.xpath("./td[4]/text()").getall()
click=content.xpath("./td[5]/text()").getall()
update=content.xpath("./td[6]/text()").getall()
item=CloItem(title=title,chapter=chapter,author=author,number=number,click=click,update=update)
yielditem
if you write the / tr in contents = response.xpath ("/ / tbody [@ id= "resultDiv"] / tr") into the following loop, that is
contents=response.xpath("//tbody[@id="resultDiv"]")
forcontentincontents:
title=content.xpath("./tr/td[2]/div/a/text()").getall()
....
if you write in this way, you can only save 1 piece of data. Ask for advice. What is the reason for this? What are we going to do with it?
pipilines is always like this:
fromopenpyxlimportWorkbook
classCloPipeline(object):
def__init__(self):
self.wb=Workbook()
self.ws=self.wb.active
self.ws.append(["title","chapter","author","number","click","update"])
defprocess_item(self,item,spider):
line=[item["title"][0],item["chapter"][0],item["author"][0],item["number"][0],item["click"][0],item["update"][0]]
self.ws.append(line)
returnitem
defclose_spider(self,spider):
self.wb.save("clo.xlsx")