Scrapy only climbed start_urls using LinkExtractor.

The

code is as follows. Start_urls can crawl information, but cannot match other links

headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    }
    start_urls=[
        "https://chaoshi.detail.tmall.com/item.htm?id=576632421624&tbpm=3"
    ]
    rules=(
        Rule(LinkExtractor(allow=(r"https://chaoshi.detail.tmall.com/item.htm\?id=\d+&tbpm=3")),process_request="request_tagPage",callback="parse_item",follow=True),
    )
    def request_tagPage(self, request):
        newRequest = request.replace(headers=self.headers)
        return newRequest
    def parse_item(self,response):
        print(response.url)
Feb.28,2022

other links need to use the CrawlSpider class. I don't know whether you use the default Spider or which

.
MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b36e59-2c035.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b36e59-2c035.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?