links parsed by Rule need to be reprocessed, for example:
rules = {
"eastmoney":(
Rule(LinkExtractor(allow="/a/\d+.html", restrict_xpaths="//*[@id="newsListContent"]//*[@class="title"]"),callback="parse_item", follow=True),
Rule(LinkExtractor(restrict_xpaths="//*[@id="ContentBody"]/div[3]/div/a/@href")),
Rule(LinkExtractor(restrict_xpaths="//div[@id="pagerNoDiv"]//a[contains(.,"")]"), follow=True)
)
}
three Rule, are defined here, the first is the resolution rule of the details page link, and the second is the resolution rule of the details page flip (the details page flip does not have the next page, but lists the page numbers, so it is a failure to get all the href, in the paging code area here. The third is the parsing rule of turning the page of the list page.
when the start_urls link goes through the parsing rules of the details page, you get the link to the details page, but the link to the details page here still needs to be processed (concatenating strings in the link). What steps should be added?