Scrapy website home page list into the detailed page, the detailed page needs to turn the page how to achieve?

what if the url starts from the list page, and after the list also grabs the title and other information, the traversed list enters the detail page according to the title, and there are multiple pages of url in this detail page that need to be followed (for example, an article has 1 page, 2 pages, 3 pages)? What should item do with it?
for example, Baidu Tieba, from the home page list!
, grab the relevant information here and then enter the post to grab all the buildings

Scrapy python

Jul.04,2021

this is very simple. After start_request, scrapy will use the parse method by default. Of course, you can also define other methods to resolve, and then in the parse method, parse to get the link to the specific details page, and then again yield Request (url=' details page link', callback=self.parse) this form of cycle to take the next page, but there will be a hole, but I think your two layers should be almost enough. I don't bother to mention this pit.

Hello, I have encountered the same obstacle as you, but I can help you with the item save. The
format is as follows:
use the
yield Request (url=' detail page link in the list interface, and callback=self.detail_parse (detail_parse is the detail page parsing), mata {'item':item})

use
item = response.meta [' item']
in the detail_parse method, so the two methods share an item variable

Previous: The problem of transaction blocking caused by database foreign key constraints

Next: How do you do this font effect on Apple's official website?

When scrapy calls the paid agent api, proxy, it can only be obtained once every 5 seconds. Where do I need to set it?
scrapyapi5 because I originally wanted to get 100 ip, at a time and put it in the agent pool, but because the agent is unstable, it can not provide support for a long time. So I gave up the idea of getting 100 ip at once. 5request ...

Scrapy python

Feb.28,2021
The Scrapy ImagesPipeline class cannot be executed.
when scrapy crawls a picture of a web page, the class that inherits ImagesPipelines is customized in the pipelines file. but the custom pipelines cannot be executed after running the program. Item cannot pass the following is a custom pipelines clas...

Web-crawler scrapy python

Mar.01,2021
Crawler timing execution
I set the crawler to run every 6 hours, and it did. The problem with is that it runs immediately after each point starts, and then executes every 6 hours. how do you stop it from running at the start of the point? ! @web Oh, it s all right. Jus...

Scrapy python

Mar.02,2021
Scrapy can only request one page at a time?
when I crawl a page with scrapy, I find that I can only request one page at a time, but the posts on the official website and Baidu say that the concurrency can be controlled through CONCURRENT_REQUESTS , but I tried it or it didn t work? CONCURRENT_...

Web-crawler scrapy python

Mar.02,2021
Scrapy scheduled task under centos, cannot be executed
execute after entering the project, the error shows scrapy command not found , but I-sharpscrapy can be run, the scrapy crawl test crawler command can also be executed alone, only the scheduled command will appear scrapy:command not found ...

Crontab scrapy python-crawler

Mar.04,2021
Ask a python scrapy deep crawler problem.
after crawling the navigation, the URL crawl that you want to continue in-depth navigation, and then the unified return value is written to xlsx < H1 >--coding: utf-8--< H1 > from lagou.items import LagouItem; import scrapy class LaGouSpider (...

Scrapy python-crawler

Mar.04,2021
Scrapy.Request cannot enter callback
scrapy.Request cannot enter callback code is as follows: def isIdentifyingCode(self, response): -sharp pass def get_identifying_code(self, headers): -sharp -sharp return scrapy.Req...

Web-crawler scrapy python

Mar.05,2021
Xpath, can you get rid of the js code?
A nasty piece of html code that writes js in div. It s a keyboard paging code xpath found that the tagged content in is gone, like this I am China person what I get is: I am human. China does not have , and then some people say that my xpath ...

Xpath scrapy python

Mar.11,2021
How does Python wrap a file in binary mode?
when scrapy saves data through Pipeline (in txt format), some data gbk codec can t encode character appears as follows. class TxtPipeline(object): def process_item(self,item,spider): path=os.getcwd() filename = path + dat...

Scrapy python

Mar.11,2021
Python-scrapy 's first crawler
scrapy tutorial: http: scrapy-chs.readthedocs. Environment: python3.6 + windows7 Project structure Directory: mySpider:scrapy crawl domz: there is no [dmoz] output as mentioned in the tutorial, is there any new file, is there something I don t...

Scrapy python

Mar.11,2021
Why is it that the data extracted by xpath in my scrapy selector is sometimes ['\ n'\ n','\ n\ t\ t']?
shouldn t text () extract the text information inside? I m a little confused ...

Scrapy python

Mar.11,2021
How to grab the content on the first page when using CrawlSpider to turn the page?
I use CrawlSpider combined with the following Rules to automatically turn the page and climb the movie information of Douban top250: rules = ( Rule(LinkExtractor(restrict_xpaths= span[@class="next"] a ), callback= parse_...

Web-crawler scrapy python

Mar.12,2021
How to use selenium in scrapy when middleware is used only once?
Why do these url jump back to the selenium of middleware via selenium jump to the url request crawled down the page in scrapy, instead of calling back to the following def def parse(self, response): contents = response.xpath( *[@id="...

Selenium scrapy python

Mar.12,2021
Why do you use scarpy to climb Dianping's city home page with content, but you can't get it when you climb by area?
as shown in the figure below, when the page is the food section of the whole city, for example, the URL of Xi an food is "http: www.dianping.com xian ch10 ", you can crawl the data normally (figure 1). 50 "http: www.dianping.com xian ... " Please ...

Python-crawler web-crawler scrapy python

Mar.14,2021
Can scrapy's Request use the same params parameter as requests?
The params parameter of requests can be easily set: requests.get (url, headers=Header, params=Param) but scrapy s Request: class Request(object_ref): def __init__(self, url, callback=None, method= GET , headers=None, body=None, ...

Scrapy python

Mar.18,2021
Python scrapy.Request could not download the web page
uses the scrapy.Request method to collect pages, but nothing is done. import scrapy def ret(response): print( start print ) print(response.body) url = https: doc.scrapy.org en latest intro tutorial.html v = scrapy.http.Request(url=url,...

Web-crawler scrapy python3.x

Mar.23,2021
The problem of scrapy RetryMiddleware Middleware retry request carrying request header and proxy ip
goal: you want to launch the current request repeatedly when the request ip fails, or when the CAPTCHA is encountered, until the request succeeds, so as to reduce the data omission of crawling. question: I don t know if my thinking is correct. At pres...

Scrapy python-crawler

Mar.23,2021
How does scrapy get item? in the file_path () function?
def gen_media_requests(self, item, info): for image_url in item[ cimage_urls ]: yield scrapy.Request(image_url, meta={ item : item}) def file_path(self, request, response=None, info=None): item = request.meta.get(...

Web-crawler crawler-picture scrapy python

Mar.24,2021
Scrapy.FormRequest timed out using proxy request, but requests request is normal
it is normal for the same proxy ip, to request with requests, but the request with scrapy.FormRequest will time out . related codes In [11]: r = requests.post( http: httpbin.org post , proxies={ http : proxy_server, https : proxy_server}) 2018...

Web-crawler proxy requests scrapy python

Mar.25,2021
Scrapy failed to run the project
operating system cetnos7 python3.7 scrapy crawl my crawler 2018-07-12 08:49:04 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: mm) 2018-07-12 08:49:05 [scrapy.utils.log] INFO: Versions: lxml 4.2.3.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w...

Scrapy python3.x

Mar.25,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-423cd3b-2af2.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-423cd3b-2af2.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?