Web source
< hr >< div class= "fl name" >
<ul>
<li>
<span></span>
<span></span><span></span>
</li>
<li><span class="ri-tag fl" data-start="2018-03-05 00:00:00+00:00" data-end="2018-06-15 15:30:00+00:00"
data-enrollment-start="2018-01-23 16:00:00+00:00" data-enrollment-end="2018-06-15 15:30:00+00:00"><b class="list-icon">$</b></span></li>
<li><span class="ri-tag fl"><b class="list-icon">g</b>5.5</span></li>
<li><span class="ri-tag fl"><b class="list-icon">7</b>10</span></li>
</ul>
</div>
</div>
<div class="txt_all">
<p class="txt"><span class="courseintro"></span>
< hr >
I want to extract from Tsinghua University and 55000 people respectively.
at first I directly set up
item ["school"] = response.xpath ("/ / div [@ class="fl name"] / ul/li/span/text ()"). Extract ()
result:
Professor Hao Zhenping
Tsinghua University
School of Economics and Management
55000 people
has been updated to Chapter 10
from this Ken speculated that all the contents were extracted.
then: extract the xpath of the school and change it to: (add a subscript)
item ["school"] = response.xpath ("/ / div [@ class="fl name"] / ul/li/span [2] / text ()"). Extract ()
) to get the correct result, that is, Tsinghua University;
then when extracting the data of 55000 people, how to set the rules can not be extracted, mainly because the tags near the data of 55000 people are all the same, and it is useless for me to add a subscript to it. How can I set this to extract the data of the number of people alone?