What is the order in which Scrapy automatically turns the page and crawls? - Codes Helper - Programming Question Answer

What is the order in which Scrapy automatically turns the page and crawls?

recently read Learning Scrapy, which mentions a crawler that automatically turns pages and crawls items on each page. The book says that Scrapy uses last-in, first-out queues.

suppose there are 30 items on each page, and start_url is set to the first page. My understanding of LIFO is that the first item out should be the item at the bottom of the last page, but the result of the routine runs first is the last item on the first page. In fact, the overall order is page one and page two. On the last page, the content order of each page is from the last to the first.

The order of

is really good, but I think the overall order of the results of routines should start from the last page. After the next link cannot be extracted on the last page, why not go directly to item_selector , and then extract the links of each project on the last page and give it to parse_item for processing? How can we deal with the first page first?

is there a problem in my understanding of yield that leads to this misunderstanding? I hope to get your help.

def parse(self, response):
    -sharp Get the next index URLs and yield Requests
    next_selector = response.xpath("//*[contains(@class,"
        ""next")]//@href")
    for url in next_selector.extract():
        yield Request(urlparse.urljoin(response.url, url))
        
    -sharp Get item URLs and yield Requests
    item_selector = response.xpath("//*[@itemprop="url"]/@href")
    for url in item_selector.extract():
        yield Request(urlparse.urljoin(response.url, url),
            callback=self.parse_item)

Python scrapy web-crawler

Mar.11,2021

looks at the source code and draws the following conclusions:
1. For each depth response , the default priority for each parse issued request is -(depth+1) . The scheduler queue of
2 and scrapy is a PriorityQueue (priority queue, priority as small as possible). The elements in this queue are LifoQueue , that is, what you call last-in-first-out queue, each LifoQueue corresponds to a priority ;
3, for each yield request ,
I hope you can understand what I'm saying.

Previous: Angular $uibModal global shutdown pop-up box problem

Next: How to write java object:for (Object object: list), question?

How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-40b313c-38ae.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-40b313c-38ae.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?