question: just started to practice the python crawler, crawling the web page with Beautifulsoup, the web page contains br tags, the crawling result is None.
tried whether to replace br, with the string"s replace function or return None.
tried replacing br, with re regular prompt and returned a type error.
Code:
from bs4 import BeautifulSoup
html_doc="""
<tr>
<td>1</td>
<td>2(<br>)</td>
<td>3(<br/>)</td>
<td>1<br/>
</td>
</tr>
"""
soup=BeautifulSoup(html_doc,"lxml")
for i in soup.find_all("td"):
print(i.string)
(1) output result:
1
None
None
None
(2) try to replace it with re regular. The code is as follows, which indicates that the return value is of the wrong type.
re_br=re.compile("<br.*?/?>")-sharp
s=re_br.sub("\n",soup)-sharpbr
(3) converting soup to str (soup), indicates that there is no find_all attribute.