How to reset or empty the data of scrapy-redis 's dupefilter?

after you have climbed with scrapy-redis spiders, you will not be able to climb again. If you change the name of the spider, you can crawl again, and if you change back to the original name, you will start the deduplication mechanism again. Although you need dupefilter, how to solve this problem if you generate some data during debugging and want to crawl again.

how to delete crawled url data previously recorded by dupefilter.

Redis scrapy

Feb.26,2021

solved
it's hard to wait several days for no one to reply

I also encountered the same problem, but what you said above gave me an idea. I changed the name of the spider and I can use it. Then it is found that the key, with the name of the crawler is saved in redis and the key can be deleted.

SCHEDULER_FLUSH_ON_START = True
you can add this sentence in the settings to automatically clean up the key in redis

Previous: WeChat Mini Programs searches the content of a page, is there any way to achieve keyword highlighting?

Next: The maven project uses jetty to run the error message.

Solutions to the problems in the implementation of distributed scrapy-redis
after the scrapy-redis distributed crawler starts, can it run scrapy runspdier xx.py on a new machine to add slaves while it is crawling? Will you crawl the same url? A running project has configured scrapy-redis-related settings (REDIS_HOST, etc.) in...

Python redis scrapy distributed-crawler

Jun.03,2021
Some questions about scrapy-redis
I want to climb a website with about 1 billion data. Url is http: xxx.com id=xx accesses and extracts the data and stores it in the database . where the id parameter in url is predictable, ranging from 0 to 1000000000 so I can generate these 1 bill...

Python web-crawler redis scrapy

Dec.27,2021
Please refer to: Redis Desktop Manager successfully linked, but Scrapy-redis was rejected, error code: 111
I ran a redis container with the following command: docker run --name redis_env --hostname redis -p 6379:6379 -v $PWD DBVOL redis data: data:rw --privileged=true -d redis redis-server I succ...

Redis scrapy docker

Jun.16,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-357b588-2688c.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-357b588-2688c.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?