The setting of deny in crawlspider is invalid?

deny, is set in Rule but does not take effect:

the code is as follows:

"123123":(
        Rule(LinkExtractor(allow="\d+-\d+-\d+/.*?-.*?.shtml", deny=("http://search.******.com.cn/.*?")),
         callback="parse_item", follow=True),
        Rule(LinkExtractor(allow="a[href^="http"]",deny_domains=("http://auto.******.com.cn")), follow=True)
        )

there are still debug forbidden links at runtime
clipboard.png

Mar.25,2022

you excluded 123123.com.cn and crawled sina.com.cn, right?

remove the protocol header from deny_domains and try using the domain name directly


setting deny and deny_domains is useless

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b3eda8-409d3.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b3eda8-409d3.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?