When I implemented a spider using Scrapy, I wanted to change the proxy of it so that the server wouldn"t forbid my request according to the frequent requests from an ip. I also knew how to change the proxy with Scrapy, using middlewares or directly change the meta when I request.
However, I used the package scrapy_splash to execute the Javascript for my spider, then I found it difficult to change the proxy because in my opinion, the scrapy_splash use a proxy server to render the JS of the website for us.
In fact, when I only use Scrapy, the proxy goes well, but turns to be unuseful when I use scrapy_splash.
So is there any way to set a proxy for the request of the scrapy_splash?
< H2 > HELP ME,PLZ,THANK YOU < / H2 >modified 4 hours later:
I have set the related settings in the setting.py
and written this in the middlewares.py
. As I mentioned before, this only works for scrapy but not scrapy_splash:
class RandomIpProxyMiddleware(object):
def __init__(self, ip=""):
self.ip = ip
ip_get()
with open("carhome\\ip.json", "r") as f:
self.IPPool = json.loads(f.read())
def process_request(self, request, spider):
thisip = random.choice(self.IPPool)
request.meta["proxy"] = "http://{}".format(thisip["ipaddr"])
And here is the code in the spider with scrapy_splash:
yield scrapy_splash.SplashRequest(
item, callback=self.parse, args={"wait": 0.5})
Here is the code in the spider without this pluguin:
yield scrapy.Request(item, callback=self.parse)