python 2.7 - How to select following sibling tag with xpath -
i have html file this:
<div id="note"> <a name="overview"></a> <h3>overview</h3> <p>some text1...</p> <a name="description"></a> <h3>description</h3> <p>some text2 ...</p> </div> `
i retrieve paragraph, each header. example, overview: text1 description: text 2 ... want write in python using xpath. thank you.
find h3
tags, iterate on them, , on each step of iteration loop, find next sibling tag p
:
import urllib2 lxml import etree url = "http://www.kb.cert.org/vuls/id/628463" response = urllib2.urlopen(url) parser = etree.htmlparser() tree = etree.parse(response, parser) header in tree.iter('h3'): paragraph = header.xpath('(.//following-sibling::p)[1]') if paragraph: print "%s: %s" % (header.text, paragraph[0].text)
prints:
overview: ruby on rails 3.0 , 2.3 json parser contain vulnerability may result in arbitrary code execution. description: lawrence pit of mirror42 discovering vulnerability. impact: lawrence pit of mirror42 discovering vulnerability. solution: lawrence pit of mirror42 discovering vulnerability. vendor information : lawrence pit of mirror42 discovering vulnerability. cvss metrics : lawrence pit of mirror42 discovering vulnerability. references: lawrence pit of mirror42 discovering vulnerability. credit: lawrence pit of mirror42 discovering vulnerability. feedback: if have feedback, comments, or additional information vulnerability, please send subscribe updates: receive security alerts, tips, , other updates.
Comments
Post a Comment