Not all scrapy download files cannot be opened?

files have been downloaded, the original files are all about 1m, but scrapy downloads are all 3k. As shown in the following picture.

Python scrapy

May.25,2022

answer:

the file url of this site cannot be directly yield to scrapy for processing, but needs to be downloaded using selenium through the browser click.

Previous: Million-level mysql database uses count, group, order query is very slow, how to optimize

Next: How does win10 open the ubuntu18.04 version in WSL through cmd?

An error was reported when creating a new scrapy project. The module No module named 'twisted.persisted' was not found.
system: Ubuntu 16.4 python3.6 twisted-15.2.1 Scrapy 1.5.0 is also installed in the virtual environment prompt the following message when creating a Scrapy: (pyvirSpider) root@ubuntu: myScrapy-sharp scrapy startproject test Traceback (most recent...

Python scrapy

Mar.03,2021
What is the order in which Scrapy automatically turns the page and crawls?
recently read Learning Scrapy, which mentions a crawler that automatically turns pages and crawls items on each page. The book says that Scrapy uses last-in, first-out queues. suppose there are 30 items on each page, and start_url is set to the first ...

Python scrapy web-crawler

Mar.11,2021
Scrapy passes parameters across components
after starting the framework to crawl the target web page start_url, you need to extract an eigenvalue from the string start_url as the collection name of the MongoDB database, and then store the item through pipeline. outline flow: spiderpipeline ...

Python scrapy

Apr.16,2021
Scrapy shell xx
when executing the scrapy shell xx URL, there is no response. The stdout in the log file returns , and does not respond to the URL address in quotation marks, and the interface does not respond. Solve used to run successfully, but suddenly failed t...

Python scrapy

Apr.20,2021
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
How does scrapy make multiple requests in the queue share a proxy ip?
problem description there are 6000 url, to start the celery generation task at 12:00 and send the queue to two servers to crawl. I use middleware to get 10 proxy ip to carry up the request at a time. After 100, I proceed to process the next set of 100...

Python scrapy web-crawler

Mar.09,2022
Invalid scrapy setting logging?
in pipelines, the code is as follows: import logging from scrapy.utils.log import configure_logging configure_logging(install_root_handler=False) logging.basicConfig( filename= log.txt , format= %(levelname)s: %(message)s , level=loggi...

Python scrapy

Mar.26,2022
Use scrapy to climb a website with more than 47000 pages, obviously did not finish climbing, the result ended every two or three hours, showing finish. But I didn't finish the climb.
< H1 > attach the source code of the crawler file. < H1 > import scrapy from openhub.items import OpenhubItem from lxml import etree import json class ProjectSpider(scrapy.Spider): name = project -sharp allowed_domains = [] start_urls ...

Python scrapy

May.25,2022
An error is reported during the operation of scrapy, ModuleNotFoundError: No module named 'pymongo'
I run the single file directly without import errors. In addition, it is normal for me to use mongodb in the py file alone, but when I run it in the scrapy project, I will say that the import failed. Why? import json import pymongo from scrapy.utils.pr...

Mongodb python scrapy python-crawler

Jul.02,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-3486347-198cb.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-3486347-198cb.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?