problem description
the goal of beginner Python, is to crawl penalty details on the Shenzhen Stock Exchange website in bulk, with a link with ".pdf". The web page ( http://www.szse.cn/disclosure.) and the corresponding source code are as follows:
the environmental background of the problems and what methods you have tried
the code you just wrote is as follows:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.szse.cn/disclosure/listed/credit/record/index.html").read().decode("utf-8")
soup=BeautifulSoup(html,"html.parser")
link=soup.find_all("a",attrs={"href":"javascript:void(0);"})
the result of execution is
>>> link
[<a class="" href="javascript:void(0);"></a>, <a class="ml10" href="javascript:void(0);"></a>, <a class="ml10" href="javascript:void(0);"></a>]
didn"t catch the link you wanted to grab.
considering that the attribute "encode-open" appears only once in the source code, it is changed to this:
link=soup.find_all("a",attrs={"encode-open":re.compile(r".*\.pdf")})
but an empty list is returned:
>>> link
[]
how can I grab this link? Thank you