python - Is accepting special characters for a range within a regex pattern possible? -
i have list of items stored in variable shown below:
listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n'] i trying find persons name, in example names "adams" , "donovan smith", need accepting special characters pattern, use backslash wondering if there way accept multiple special characters @ once without inserting multiple backslashes
i wanting wildcard (ignore) unique number , name in weblink example: 23138 , 'donovan-smith'
my current pattern looks follows:
pattern1 = re.compile('<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">(.*?)<\/a>\n') any appreciated.
if doing parsing html, why not try beautifulsoup, mechanize or lxml.html?
for instance,
import lxml.html listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n'] string = ' '.join(listitems) page = lxml.html.fromstring(string) a_tags = page.cssselect('a') names = [] tag in a_tags: names.append(tag.text_content().strip()) print names ['adams', 'donovan smith'] would give want. plus, can fine-tune tags select based on xpaths, css, etc.
but if want go writing regex yourself, don't start more simple, e.g.
pattern = re.compile(r'<a.*?">(.*?)<\\/a>') so:
import re listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n'] pattern = re.compile(r'<a.*?">(.*?)<\\/a>') names = [] item in listitems: n = re.search(pattern, item).group(1) names.append(n) print names ['adams', 'donovan smith']
Comments
Post a Comment