problem description
my regular expression has always been a blind spot. I hope my friends can help write a regular expression to extract the title, picture link, article link and description of the following web page. Thank you here!
regular web page text content is required
<article class="excerpt excerpt-1">
<a href="/szb/eth/28157.html" class="focus" target="_blank"><img alt=""" " class="thumb lazy" data-original="/uploads/allimg/180906/8-1PZ6094Za45-lp.png"/></a>
<header>
<h2><a href="/szb/eth/28157.html" title="<b>"" </b>" target="_blank"><b>"" </b></a></h2>
</header>
<p class="meta">
<time><i class="fa fa-clock-o"></i><font color="-sharpe15c34">2018-09-06</font></time>
<span class="pv"><i class="fa fa-eye"></i>(1986)</span>
<span class="pc"><i class="fa fa-comments-o"></i>(<span id="url::http://www.bitcoin86.com/szb/eth/28157.html" class = "cy_cmt_count" ></span>)</span>
<p class="note">(CBOE) ETH Business Insider CBOE2018 2017...
</article>
what result do you expect? What is the error message actually seen?
I need to extract the href from the A tag as the text content of the tag in URL
< header > as the article link. The data-original attribute in the
tag serves as a link to the picture. Text in
< p class= "note" > as a description.
because I"m not familiar with regularities, I don"t know if I can get all the above four attributes in one expression and put them into an array list with the indexes of 0meme 1meme 2meme 3
.if the above idea is not realistic, I hope the god who knows it can help write four regular expressions. Thank you again.
< hr >my problem has been solved by myself, but if you have a good solution, you are welcome to post it to help other people in need.