python - Is accepting special characters for a range within a regex pattern possible? -


i have list of items stored in variable shown below:

listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n'] 

i trying find persons name, in example names "adams" , "donovan smith", need accepting special characters pattern, use backslash wondering if there way accept multiple special characters @ once without inserting multiple backslashes

i wanting wildcard (ignore) unique number , name in weblink example: 23138 , 'donovan-smith'

my current pattern looks follows:

pattern1 = re.compile('<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">(.*?)<\/a>\n') 

any appreciated.

if doing parsing html, why not try beautifulsoup, mechanize or lxml.html?

for instance,

import lxml.html  listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n']  string = ' '.join(listitems)  page = lxml.html.fromstring(string)  a_tags = page.cssselect('a')  names = [] tag in a_tags:   names.append(tag.text_content().strip())  print names ['adams', 'donovan smith'] 

would give want. plus, can fine-tune tags select based on xpaths, css, etc.

but if want go writing regex yourself, don't start more simple, e.g.

pattern = re.compile(r'<a.*?">(.*?)<\\/a>') 

so:

import re  listitems = ['<a href=\"\/other\/end\/f1\/738638\/adams\">adams<\/a>\n', '<a href=\"\/other\/end\/f1\/23138\/donovan-smith\">donovan smith<\/a>\n']  pattern = re.compile(r'<a.*?">(.*?)<\\/a>')  names = [] item in listitems:   n = re.search(pattern, item).group(1)   names.append(n)  print names ['adams', 'donovan smith'] 

Comments

Popular posts from this blog

basic authentication with http post params android -

vb.net - Virtual Keyboard commands -

How to get multiresult with multicondition in Sql Server -