The regularization problem in python

http://www.cyzone.cn/rss
matches the title in it, but mine doesn"t always match. I don"t think I have a problem

.
 cyzone_title_list=re.findall(u"<item>.*?<![CDATA[(.*?)]]>.*?</item>",response.text,re.S)
Mar.21,2021

[] symbols need to be escaped

rule = re.compile(r'<title><!\[CDATA\[(.*?)\]\]></title>')

I have emphasized more than once. Again, regular processing for tag type documents is definitely the worst solution.


Thank you. Use cyzone_title_list=re.findall (u'< item >. *? . *? < / item >', response.text,re.S) . In addition, this rss, conforms to the xml format, so it is recommended to use lxml libraries to extract it.

Menu