Beautifulsoup crawls digital notes and cannot extract the original numbers. the original text shows "8080", but after crawling, different numbers are displayed each time.
question: the content of the original text is "8080", but after crawling, different numbers are displayed each time.
1. Page content
II. Program
import requests
from bs4 import BeautifulSoup
User_Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36"
headers= {" User-agent":User_Agent}
proxy = []
url =" https://proxy.coderbusy.com/"
res = requests.get (url,headers=headers)
soup = BeautifulSoup (res.text,"lxml")
ips = soup.findAll ("tr")
for x in range (1 (ips)):
ip = ips[x]
ip_temp = soup.select("-sharpsite-app > div > div > div > div > table > tbody > tr > td.port-box")
aa=ip_temp[0].attrs.get("data-ip")
aaa=ip_temp[0].string
print (ip_temp [0])
print (aa)
print (aaa)
3. Running result
< td class= "port-box" data-i= "8450" data-ip= "62.33.159.116" > 17981 < / td >
62.33.159.116
17981
the real port is replaced with js after the page loads. The review page element has an encrypted mian.js:
eval(function (p, a, c, k, e, d) { e = function (c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)) }; if (!''.replace(/^/, String)) { while (c--) d[e(c)] = k[c] || e(c); k = [function (e) { return d[e] }]; e = function () { return '\\w+' }; c = 1; }; while (c--) if (k[c]) p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c]); return p; }('$(e(){$(\'\\f\\3\\g\\8\\1\\r\\p\\g\\k\')["\\4\\2\\q\\o"](e(u,h){5 7=$(h);5 j=7["\\i\\2\\1\\2"](\'\\a\\3\');5 9=l["\\3\\2\\8\\d\\4\\m\\b\\1"](7["\\i\\2\\1\\2"](\'\\a\'));5 c=j["\\d\\3\\n\\a\\1"](\'\\f\');t(5 6=0;6<c["\\n\\4\\b\\s\\1\\o"];6PP){9-=l["\\3\\2\\8\\d\\4\\m\\b\\1"](c[6])}7["\\1\\4\\k\\1"](9)})})', 31, 31, '|x74|x61|x70|x65|var|d7|ClpoEy3|x72|TO5|x69|x6e|tVF6|x73|function|x2e|x6f|fnDKXroKU2|x64|jgemfCG4|x78|window|x49|x6c|x68|x62|x63|x2d|x67|for|wssP1'.split('|'), 0, {}))
online decryption to get:
$(function()
{
$('\x2e\x70\x6f\x72\x74\x2d\x62\x6f\x78')["\x65\x61\x63\x68"](function(wssP1,fnDKXroKU2)
{
var ClpoEy3=$(fnDKXroKU2);
var jgemfCG4=ClpoEy3["\x64\x61\x74\x61"]('\x69\x70');
var TO5=window["\x70\x61\x72\x73\x65\x49\x6e\x74"](ClpoEy3["\x64\x61\x74\x61"]('\x69'));
var tVF6=jgemfCG4["\x73\x70\x6c\x69\x74"]('\x2e');
for(var d7=0;
d7<tVF6["\x6c\x65\x6e\x67\x74\x68"];
d7PP)
{
TO5-=window["\x70\x61\x72\x73\x65\x49\x6e\x74"](tVF6[d7])
}
ClpoEy3["\x74\x65\x78\x74"](TO5)
}
)
}
)
after the hexadecimal is converted to a string, you get:
$(function() {
$('.port-box')["each"](function(wssP1, fnDKXroKU2) {
var ClpoEy3 = $(fnDKXroKU2);
var jgemfCG4 = ClpoEy3["data"]('ip');
var TO5 = window["parseInt"](ClpoEy3["data"]('i'));
var tVF6 = jgemfCG4["split"]('.');
for (var d7 = 0; d7 < tVF6["length"]; d7PP) {
TO5 -= window["parseInt"](tVF6[d7])
}
ClpoEy3["text"](TO5)
})
})
as you can see from the code, the real port is. The value of the data-ip attribute in prot-box minus the sum of four digits of ip
Thank you very much! You said, I looked up the page JS, really has. But if you can't see any information from the outside, how to determine whether there is a JS?
"as you can see from the code, the real port is. The value of the data-ip attribute in prot-box minus the sum of four digits of ip."
this is a fixed value, but the value changes every time it is updated. Do I misunderstand that?