for example, I need to climb the news and article pages of many websites. I need to extract the title, content, release time and other information of the corresponding page. But the page format of each site is different, do I have to write a crawler for each site?
also, after the information is captured, the format of each website is also different. I need to adjust it to the format of my website. Is there a set of adjustment methods that can be applied to all formats?