Data Collection of Chinese Government Purchasing Network

the website is http://www.ccgp.gov.cn/cggg/d...

.

clipboard.png

collect the content of the red box, and at the same time have the details of the bidding (that is, the content after clicking the link), can you collect it?

May.11,2022
All the data visible on the

page can be collected, so what you say is not a problem.
1. You can learn crawlers to grab data by yourself, and you can master it by reading the first few articles of "from getting started to mastering" in the tutorial.
2. Entrust set to search customers for official customization.
the agent used to collect data is also very important. I would like to recommend several projects that our company has just done before. The number of projects that our company has just done is relatively small, and the number of stations used at that time is also small. Later, if there are more terminals, they will not be able to satisfy me. What they are using now is the proxy cloud, which mainly does not restrict the use of terminals, and IP is also generated 24 hours a day, which is more suitable for data collection companies. It is said that math companies also use proxies in their homes, with hundreds of millions of request connections a day.


https://gitbook.cn/gitchat/ac...
wrote a chat before, specifically about using Selenium to achieve crawlers, but also automatic paging, if it is helpful to you, you can see


OK. The request parameters are not encrypted, and the data can be parsed according to format, so it is not difficult to crawl. Self-learning Python or Java crawler can be achieved.


scarpy is recommended
follow the tutorial and you can do
https://yiyibooks.cn/zomin/Sc...


not to mention a crawler. The program requests url to get the returned data. The returned data here is html, which can be parsed. Here I use java to implement, using the jsoup class library.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class Test {
    public static void main(String[] args) throws Exception {
        Document doc = Jsoup.connect("http://www.ccgp.gov.cn/cggg/dfgg/gkzb/").get();
        Elements elements = doc.select(".main .c_list_bid>li");
        for(Element element : elements){
            Elements childs =  element.children();
            String  name = childs.get(0).text();
            String area = childs.get(2).text();
            String person = childs.get(3).text();
            System.out.println(":" + name);
            System.out.println(":" + area);
            System.out.println(":" + person);
            System.out.println("===============================");
        }
    }
}

the result of parsing is

:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:2
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:201924...
:
:
===============================
:"" ...
:
:
===============================
:0809-1841GZG11C24
:
:
===============================
:2018
:
:
===============================
:()
:
:
===============================
:2018
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:
:
:
===============================
:2019
:
:
===============================
:
:
:
===============================

details collection is similar.

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1e7824f-4646c.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1e7824f-4646c.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?