An error occurred when Python3 crawled the short rent of Piglet.

just contacted python, according to https://blog.csdn.net/mtbaby/.
wanted to crawl piglet short rent information, but then IP was blocked.
then looks at the problem of agent ip , but still can"t get the information

import requests
from lxml import etree
import time
proxies = {
    "http": "http://61.135.217.7:80",
}
user_agent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36"
url = "http://hz.xiaozhu.com/"
headers = {"User-Agent": user_agent}
data = requests.get(url, headers=headers, proxies=proxies).text
h = etree.HTML(data)
home = h.xpath("//*[@id="page_list"]/ul/li")
time.sleep(2)
for div in home:
    title = h.xpath("./div[2]/div/a/span/text()")[0]  -sharp 
    price = h.xpath("./div[2]/span[1]/i/text()")[0]  -sharp 
    print("{}-->{}}".format(title, price))

the running result is as follows

hoping to help solve it. Thank you very much!

Web-crawler python

Feb.28,2021

not every agent IP is valid. You need to make sure that the agent is valid before using

import requests
from pyquery import PyQuery as Q

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'}
proxies = {
    "http": "http://103.235.245.35:8080"
}

r = requests.get('http://hz.xiaozhu.com/', headers=headers, proxies=proxies)
for _ in Q(r.text)('-sharppage_list li'):
    title = Q(_).find('.result_title').text()
    price = Q(_).find('.result_price').text()

    print title, price

Previous: DataFrame ['column name'] = Series Why can't you do this?

Next: How to add token to the axios request method every time

How to clean up some unwanted HTML attributes in crawler data
for example, for the following data <p id="a">data I just want to keep data is there a quick way to do this? ...

Web-crawler python pyspider scrapy

Mar.01,2021
There is a problem that we can't get the playback information continuously when using bilibili api to obtain the playback information.
api: http: api.bilibili.com x web. there are already 70w aid, in the library every morning to get video playback updates by aid , and then there is a sudden problem in the early hours of this morning. Every time we get 200,300 pieces of data, there w...

Web-crawler python

Mar.02,2021
I would like to ask why this situation can not crawl the content of the tag.
as shown in the figure, only the tag is returned, but the content is gone. I haven t been learning crawlers for long, and I don t know why I m wrong. ...

Web-crawler python

Mar.02,2021
The < script > tags in html are all exactly the same. How can you tell the difference?
<html> <srcipt > 1 <srcipt > 2 .... < html> there must be no problem when loading. If I want to get a specified srcipt tag, I can get the element by getting the < script > array and then using the su...

Requests web-crawler python javascript

Mar.03,2021
Python 3.6Readwrite file transcoding
I picked the code of a website. How can I write it to the txt document? how can I write it to the document? here is my code and error report ...

Web-crawler python

Mar.03,2021
Simulated login pull hook net, one of the parameters in post's form is that signature, is generated as soon as it enters the login interface without entering account information, but I can't find it.
simulate login pull hook. One of the parameters in post s form is that signature, is generated as soon as it enters the login interface without entering account information, but I can t find . there is a result of searching signature in html with F...

Web-crawler python

Mar.05,2021
Multiple scrapy-redis cannot be crawled at the same time
Open two scrapy tasks at the same time, and then go to push in redis a start_url but only one scrapy task An is running, and when An is stopped, B task will begin to crawl. the reason seems to be that requests is not saved in redis while...

Scrapyd scrapy web-crawler python-crawler python

Mar.05,2021
When using selenium to drive chrome to find certain elements, the website cannot be found. It is a course learning platform.
after I log in to the website through selenium, I want to start automatically clicking some buttons on the web page. Through xpath positioning, I can t find . The code is as follows (account password is not important, you need to log in to enter the...

Selenium chrome web-crawler python

Mar.09,2021
How to determine the date element in python requests.post?
how does the date element in requests.post determine when building a crawler request such as requests.post (url, data=post_data)-sharp pseudo code the content of this post_data is different when crawling different websites. how should this content...

Post web-crawler python

Mar.12,2021
Requests cookies simulated login encountered problems
as mentioned above, I tried to use cookies to simulate login to www.jianshu.com, but failed. Come here to find some ideas. the process of simulation: f12 cookies,cookies network found a little too much, first added all of it, found that it didn t wor...

Requests web-crawler python

Mar.14,2021
Weibo scrolling load crawler problem
when browsing someone s Weibo home page, not all of the content will be loaded. It is divided into three loads. when I scroll to a location, I will initiate another request. but the content doesn t exist, and the request address is the same, a...

Web-crawler python

Mar.14,2021
How to write selenium in scrapy
...

Web-crawler python

Mar.16,2021
According to an example to write a program to crawl amazon pages, but there are many mistakes, do not understand, ask for help!
crawl the title and price of goods in Amazon China, Mobile phone-> Mobile Communications-> Apple Phone. its URL= https: www.amazon.cn s ref=s. my python code is as follows: import requests from bs4 import BeautifulSoup import re -sharpHTML import ti...

Web-crawler python

Mar.16,2021
Python selenium crawler
option.add_argument ( --start-maximized ) self.driver.maximize_window () what is the maximum difference between the two ...

Web-crawler python

Mar.17,2021
Dianping's latest anti-crawling: identify dynamic second-cut agent IP?
I have been climbing the front page of Dianping s store recently. Url is similar to http: m.dianping.com shop 4094416. Because Dianping has anti-crawling against IP, I built a dynamic IP tunnel that can switch IP, in seconds, that is, to change an IP...

Web-crawler python

Mar.18,2021
Why can't selenium search be located?
**** ...

Web-crawler python

Mar.18,2021
Check selenium does not return content
...

Web-crawler python

Mar.18,2021
Check to find out how the search is anti-crawling.
https: www.qichacha.com I climbed with a headless browser, simulated search keywords for dynamic ip 5 seconds for a do not log in, you can start to search keywords, but later can not, I do not know through what anti-climbing? ...

Web-crawler python

Mar.19,2021
Why can't my xpaht match?
<item> <title> <![CDATA[ IP ]]> < title> <link> <![CDATA[ cyzone_title_list=etree.HTML (response.text.encode ( utf-8 )) .XPath ( item title text () ) isn t text in title? http: www.cyzone.cn rss link ...

Web-crawler python

Mar.20,2021
The number of entries returned from python requeset crawl data is limited.
import requests import random url = "https: hk.trip.com flights Ajax SearchFlight" payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW r nContent-Disposition: form-data; name= "context " r n r n{ "SearchNo "...

Web-crawler python

Mar.22,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-34d63b4-1c7f3.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-34d63b4-1c7f3.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?