when scrapy saves data through Pipeline (in txt format), some data "gbk" codec can"t encode character
appears as follows.
class TxtPipeline(object):
def process_item(self,item,spider):
path=os.getcwd()
filename = path + "\data\%s.txt"%item["classic"]
with open(filename, "a")as f:
f.write(item["title"] + "\n")
f.write(item["time"] + "\n")
f.write(item["text"] + "\n")
...
so I use the binary append mode to save the data, and unify it into utf8
, such as:
with open(path, "ab")as f:
f.write(item["title"].encode("utf-8", errors="ignore")+"\n")
but"\ n"is not bite, so it is changed to b"\ n", but you can"t change the line to do so.
questions are as follows:
- how to solve a problem like the one above?
- how to solve the coding problem if is not in binary mode? ( Note: all item entries are strings )
I am not good at learning, so please give me some advice.