c# - possible to use regex to find all matches including its own delimiter? -
i trying write email subject line parser user defines own parsing rules. rules match member names on subject line , use up. catch member name might contain parsing rule delimiter.
// rule has defined text between > matches member name. // note user can make parsing rule example. string samplerule = ">{member}>"; // left out parsing code. have figured out looking // member , prefix/postfix delimiters. string prefix = ">"; string postfix = ">"; // note member>name3 valid member name string subject = "subject>membername1>membername2>member>name3>endsubject"; string pattern = "(?="+prefix+"([a-z].+?)"+postfix+")"; match m = regex.match(subject, pattern); while(m.success) { // possible member name console.writeline(m.groups[1].tostring()); m = m.nextmatch(); } // output needs // membername1 // membername2 // member>name3 // // membername1 // membername2 // member // note spanning bad matches ok, example // membername1>membername2 or membername1>membername2>member>name3
here's fragile attempt use regular regular expressions , recursion:
static class program { static void main(string[] args) { string prefix = ">"; string suffix = ">"; string subject = "subject>membername1>membername2>member>name3>endsubject"; var result = find(subject, true, prefix, suffix).tolist(); result.foreach(item => { console.writeline(item); }); /* output is: membername1>membername2 member>name3 *match membername1 *match membername2 *match member name3 */ } private static ienumerable<string> find( string subject, bool toggle, string prefix, string suffix) { string r1 = @"(?<=" + prefix + @")(?>([\w]*(" + prefix + "|" + suffix + @")[\w]*))(?=" + suffix + ")", r2 = @"[\w]*"; var temp = regex.matches(subject, toggle ? r1 : r2 ) .cast<match>() .tolist(); return temp.selectmany(m => temp .select(i => i.value) .union(find(m.value, !toggle, prefix, suffix))) .where(s => !string.isnullorempty(s)) .distinct(); } }
note: i'm not sure if in example, >
in member>name3
considered prefix or suffix.
[edit] here's approach, doesn't use regular expressions. takes account >
in member>name3
prefix or suffix:
var separators = new[] { prefix, suffix }; var firstresult = separators .selectmany(s => subject .split(separators,stringsplitoptions.removeemptyentries) .skip(1) .reverse() .skip(1) .reverse()) .distinct() .tolist(); var result = firstresult .zip(firstresult.skip(1), (a, b) => { var l = new list<string>(); separators.tolist().foreach(s => { l.add(string.format("{0}{1}{2}", a, s, b)); }); return l; }) .selectmany(s => s) .union(firstresult) .tolist();
Comments
Post a Comment