the website is http://www.ccgp.gov.cn/cggg/d...
.
collect the content of the red box, and at the same time have the details of the bidding (that is, the content after clicking the link), can you collect it?
the website is http://www.ccgp.gov.cn/cggg/d...
.
collect the content of the red box, and at the same time have the details of the bidding (that is, the content after clicking the link), can you collect it?
page can be collected, so what you say is not a problem.
1. You can learn crawlers to grab data by yourself, and you can master it by reading the first few articles of "from getting started to mastering" in the tutorial.
2. Entrust set to search customers for official customization.
the agent used to collect data is also very important. I would like to recommend several projects that our company has just done before. The number of projects that our company has just done is relatively small, and the number of stations used at that time is also small. Later, if there are more terminals, they will not be able to satisfy me. What they are using now is the proxy cloud, which mainly does not restrict the use of terminals, and IP is also generated 24 hours a day, which is more suitable for data collection companies. It is said that math companies also use proxies in their homes, with hundreds of millions of request connections a day.
https://gitbook.cn/gitchat/ac...
wrote a chat before, specifically about using Selenium to achieve crawlers, but also automatic paging, if it is helpful to you, you can see
OK. The request parameters are not encrypted, and the data can be parsed according to format, so it is not difficult to crawl. Self-learning Python or Java crawler can be achieved.
scarpy is recommended
follow the tutorial and you can do
https://yiyibooks.cn/zomin/Sc...
not to mention a crawler. The program requests url to get the returned data. The returned data here is html, which can be parsed. Here I use java to implement, using the jsoup class library.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://www.ccgp.gov.cn/cggg/dfgg/gkzb/").get();
Elements elements = doc.select(".main .c_list_bid>li");
for(Element element : elements){
Elements childs = element.children();
String name = childs.get(0).text();
String area = childs.get(2).text();
String person = childs.get(3).text();
System.out.println(":" + name);
System.out.println(":" + area);
System.out.println(":" + person);
System.out.println("===============================");
}
}
}
the result of parsing is
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:2
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:201924...
:
:
===============================
:"" ...
:
:
===============================
:0809-1841GZG11C24
:
:
===============================
:2018
:
:
===============================
:()
:
:
===============================
:2018
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:2019
:
:
===============================
:
:
:
===============================
details collection is similar.