problem description
crawl https://auto.ru/cars/all/?sor.. After opening the page, you need to click a button. When you click the button, the website will set cookies.. However, there is a field in cookies that is not set through set-cookies. The field is" gdpr":"1"
. The full field is:
_csrf_token=020335a5dcb38cc95823931e5590aa1d6f8e0c8e5d1efbee; suid=77c527af480928e8842121176d182182.9bbd97c75bea15045fc7de2399bfbbea;
from=direct; autoru_sid=a%3Ag5c0a49f62a75ujsbkcumbspu4hpvpn0.f004ec1118196bd8749cc43268eafa85%7C1544178166087.604800.kZCWC5djG3zKbPIpqNyfVA.EuliNzEnolonmG71Ik1guRXEQZeqqojzIJQwZf1ZJ60; autoruuid=g5c0a49f62a75ujsbkcumbspu4hpvpn0.f004ec1118196bd8749cc43268eafa85;
gdpr=1;
from_lifetime=1544178179814;
X-Vertis-DC=sas
solve the problem in requests request
url = "https://auto.ru/cars/all/?sort=fresh_relevance_1-desc&output_type=list&page=1"
header = {
"Connection":"keep-alive",
"Upgrade-Insecure-Requests":"1",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate, br",
"Accept-Language":"zh-CN,zh;q=0.9",
}
requests_session = requests.session()
requests_session.cookies = requests.utils.cookiejar_from_dict({"gdpr":"1"})
but how do you solve this problem with scrapy?
cookies can be enabled in scrapy
"COOKIES_ENABLED":True,
"COOKIES_DEBUG":True,
you can also pass cookies in the request
yield scrapy.Request(url=task["task_url"],callback=self.handle_car_item_response,meta={"cookiejar":i})