problem description
I want to crawl the contents of all the td tags in the tr tag and get the absolute path within the onclick attribute
the environmental background of the problems and what methods you have tried
try to directly ignore onclick to crawl the contents of td tags in all tr tags. Test is successful
attempt to crawl onclick content first, ignore headers (that is, < tr class="fist >), both failed
related codes
/ / Please paste the code text below (do not replace the code with pictures)
second attempt code
import requests
from bs4 import BeautifulSoup
import re
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
"Host":"www.pgskpw.com"}
link = "http://www.pgskpw.com/"
r = requests.get(link,headers = headers,timeout = 20)
soup = BeautifulSoup(r.text,"html.parser")
user_information = re.compile(r"javascript:window.open("/personal_show.php?showid=");readent(this);")
person_book = soup.find("div",class_="conBox listbox mb10")
person_list = person_book.find_all("tr",attrs={"onclick":user_information})
for i in person_list:
print(i)
what result do you expect? What is the error message actually seen?
get the contents in the td tag and onclick