Search headless browser cannot search

Enterprise search cannot be searched with selenium headless browser

https://www.qichacha.com/

Python web-crawler

Mar.18,2021

may have anti-crawler means, but selenium still has some characteristics, such as some special properties in global objects.

see that you have asked a lot of questions for crawling. Here I would like to remind you:
if you use ChromeDriver headless mode, you cannot visit the site with js scripts inserted through document.write () . Refer to a question on stackoverflow :
example:

>>> from selenium import webdriver
>>> option = webdriver.ChromeOptions()
>>> option.add_argument('--headless')
>>> driver = webdriver.Chrome(chrome_options=option)
[0608/163830.206:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.

DevTools listening on ws://127.0.0.1:60357/devtools/browser/36a1f861-d1ab-4cef-a5a9-3072bbada0fc
>>> driver.get('https://www.baidu.com')
[0608/163849.677:INFO:CONSOLE(715)] "A parser-blocking, cross site (i.e. different eTLD+1) script, https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/global/js/all_async_search_8d20902.js, is invoked via document.write. The network request for this script MAY be blocked by the browser in this or a future page load due to poor network connectivity. If blocked in this page load, it will be confirmed in a subsequent console message. See https://www.chromestatus.com/feature/5718547946799104 for more details.", source: https://www.baidu.com/ (715)

here https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/global/js/all_async_search_8d20902.js is written into the html text through document.write () and then loaded, and will not be executed, so an error is reported.

but Firefox doesn't have this problem, so I recommend you use Firefox's headless mode, or phantomjs, a headless browser.
Firefox example:

from selenium import webdriver
option = webdriver.FirefoxOptions()
option.add_argument('--headless')
driver = webdriver.Firefox(firefox_options=option)
driver.get('https://www.qichacha.com')
-sharp ...

of course, you need to install Firefox. before using Firefox

chat 3327815988 DATA, qq.com, the full library of Sky Eye

Previous: What should I do if I want to emulate the page of codeshelper?

Next: Vue element ui Table component

Python crawler ip agent problem
self.s = requests.session () -sharp -sharp proxyHost = "http-dyn.abuyun.com" proxyPort = "9020" -sharp proxyUser = "HH30H1A522679P8D" proxyPass = "74EF13F061719736" proxyMeta = "http: %(user)s:%(pas...

Python web-crawler

Feb.28,2021
Python crawler regularity problem
<tr> <td>8< td> <td> ...

Python web-crawler

Feb.28,2021
I want to crawl Douban book search keywords after the content, you can check the source code garbled.
URL: https: book.douban.com subje. I want to climb to get the names, number of reviews, and ratings of all books searched by Douban keywords, but after I opened the source code interface, the following situation occurred. There is no problem with usin...

Python web-crawler

Mar.02,2021
Request failed to request a page after header was configured.
found that a page still cannot get page data after configuring host,U-An in header routinely. the get command sent is checked through the debugging tool, and there is no difference. I really can t find the reason. Is it because I lack that part of k...

Python web-crawler

Mar.04,2021
Selenium analog search I search is always unable to locate the label, no problem.
class qichacha: def __init__(self): option = webdriver.ChromeOptions() option.add_argument( --start-maximized ) -sharp option.add_argument( --headless ) -sharp self.driver = webdriver.Chrome(chrome_options...

Python web-crawler

Mar.18,2021
Check if you do not log in to selenium, there is a problem with the simulated search.
Traceback (most recent call last): File "qichacha.py", line 139, in <module> qichacha().read_data() File "qichacha.py", line 39, in read_data self.search_index(name) File "qichacha.py", line 92, in search...

Python web-crawler

Mar.18,2021
The popular travel notes that crawled to the home page of the hornet nest encountered the problem of paging request parameters?
I encountered a problem when I wrote for the first time that the crawler wanted to crawl the travel notes on the home page of the hornet s nest. as follows figure 1.1 I want to mainly crawl the popular travel notes on the home page. 1.1 Chrome page...

Python web-crawler

Mar.18,2021
How does selenium switch ip
how to switch the format of ip with account and password in selenium how to switch ip with account and password on selenium ip and port, account and password for example: wrewre52a@117.41.186.194:888 can t be found on the Internet. ...

Python web-crawler

Mar.18,2021
The anti-crawl CAPTCHA pops up when the crawler is running, but my machine requests data is empty, so change the machine, but ip can also request to return data. That's why.
website is "Enterprise search " ...

Python web-crawler

Mar.20,2021
What language is the crawling system like Jinri Toutiao implemented?
* * I would like to ask Senior Daniel two questions 1, java and python. Which two languages are more suitable for crawling systems? 2. In what language is Jinri Toutiao s crawler crawling system written? * * ...

Java python web-crawler Jinri-Toutiao

Apr.02,2021
Construct Ajax request to crawl Ctrip train ticket information and return error content
topic description I want to write a crawler to crawl Ctrip s train ticket information. I found that the ticket information was loaded asynchronously using Ajax, so I constructed a post request. Although headers,data and other data are available, the ...

Python web-crawler

May.23,2021
How does python get all the code within a tag in a piece of html code?
for example, I need all the source code within the < table > tag for special reasons, do not use the page_source method ...

Python web-crawler

Sep.07,2021
Baidu searches and extracts the address inside.
the addresses I found through Baidu search are incomplete, such as https: codeshelper.com a 11. ellipsis is not the same as the one opened. Ask the I requested through the search interface. ...

Python web-crawler

Sep.21,2021
How to reuse an open browser instance by Python Selenium Webdriver
Python Selenium Webdriver reuses an open browser instance ...

Python web-crawler

Sep.26,2021
How does beatuifulsoup get the value of an attribute?
the following code, I want to use beatuifulsoup to get the value of posid (1). How do I write it? <div class="ec_ad_results" posid="1" prank="2" sourceid="160"> ...

Python web-crawler

Sep.27,2021
Error accessing python requests URL
import requests from bs4 import BeautifulSoup import re user_agent = Mozilla 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit 537.36 (KHTML, like Gecko) Chrome 70.0.3521.2 Safari 537.36 headers = { User-Agent :user_agent} url = http: bxjg.bi...

Python web-crawler

Nov.12,2021
How does a python crawler get dynamic table content? Browser displays click without get and other web requests
I want to climb the ip list of the following website https: free-proxy-list.net because every page will be updated with ip, I need to turn the page. At first, I can do it with selenium, but I think the cost is too high. So I want to use requests to...

Html python web-crawler

Nov.25,2021
Get the content of the web page through python according to Firebug's post, but it can't be displayed correctly?
I want to get some ip http: spys.one en free-proxy. of this website. because if I click servers per page to change to 100 or 50, there will be more ip in the table. I check that Firebug, should be a post request, and then I replace headers and param...

Post python web-crawler html requests

Nov.25,2021
Scrapy-redis distributed problem
-sharp! usr bin env python3 __author__ = Stephen import scrapy, json from Espider.tools.get_cookies import get_cookies from scrapy_redis.spiders import RedisSpider from scrapy_redis.utils import bytes_to_str from Espider.items.jingzhunitem import jin...

Python web-crawler

Nov.25,2021
Python selenium error SyntaxError: unexpected token: identifier
...

Python web-crawler selenium automated-test

Nov.26,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-41042ad-612f.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-41042ad-612f.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?