The information crawler of the web page comes in to discuss it.

as shown in the figure three websites, we need to grab the company name, address, and mobile phone number;
the mobile phone number is easy to get, but the accuracy is not very high; for example, there is a string of numbers 1860126157733;
will deduct 18601261577 as the mobile phone number;
company name, the address capture rate is very low;
I don"t know if there are similar friends who can discuss

ps: pictures come from the Internet, because they are open to the public network, and I don"t have a mosaic of information

Python java php crawler-picture web-crawler

Mar.22,2021

if I were to do it, I would probably adopt this idea. First of all, find the location of the key information. The information has a specific hierarchical structure, specific to which html tag it corresponds to, and what class attribute this html tag has. This step narrows the scope. Then look for keywords: such as company name, mobile phone, name, etc., and find the associated field. Finally, regular assistance is considered.

this is easy, phpspider learn

Previous: Python quick sort, 10, 000, stack overflow

Next: The question that alert blocking threads causes the execution of functions in the event queue to be out of order

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-53ff21a-24.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-53ff21a-24.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?