Show / Hide Table of Contents

Class regexp

PCRE regular expression.

public class regexp
Remarks

PCRE is a regular expression library: https://www.pcre.org/. PCRE regular expression syntax: full, short. Some websites with tutorials and info: rexegg, regular-expressions.info.

This class is an alternative to the .NET System.Text.RegularExpressions.Regex class. The regular expression syntax is similar. PCRE has some features unavailable in .NET, and vice versa. In most cases PCRE is faster. You can use any of these classes. Functions of elm class support only PCRE.

Terms used in this documentation and in names of functions and types:

  • regular expression - regular expression string. Also known as pattern.
  • subject string - the string in which to search for the regular expression. Also known as input string.
  • match - the part (substring) of the subject string that matches the regular expression.
  • groups - regular expression parts enclosed in (). Except non-capturing parts, like (?:...) and (?options). Also known as capturing group, capturing subpattern. Often term group also is used for group matches.
  • group match - the part (substring) of the subject string that matches the group. Also known as captured substring.

This library uses an unmanaged code dll AuCpp.dll that contains PCRE code. This class is a managed wrapper for it. The main PCRE API functions used by this class are pcre2_compile and pcre2_match. The regexp constructor calls pcre2_compile and stores the compiled code in the variable. Other regexp functions call pcre2_match. Compiling to native code (JIT) is not supported.

A regexp variable can be used by multiple threads simultaneously.

Also there are several String extension methods that use this class. The string variable is the subject string. These methods create and use cached regexp instances for speed. The regexp constructor does not use caching.

Examples
var s = "one two22, three333,four"; //subject string
var x = new regexp(@"\b(\w+?)(\d+)\b"); //regular expression

print.it("//IsMatch:");
print.it(x.IsMatch(s));

print.it("//Match:");
if(x.Match(s, out var m)) print.it(m.Value, m[1].Value, m[2].Value);

print.it("//FindAll with foreach:");
foreach(var v in x.FindAll(s)) print.it(v.Value, v[1].Value, v[2].Value);
print.it("//FindAll, get only strings of group 2:");
print.it(x.FindAll(s, 2));

print.it("//Replace:");
print.it(x.Replace(s, "'$2$1'"));
print.it("//Replace with callback:");
print.it(x.Replace(s, o => o.Value.Upper()));
print.it("//Replace with callback and ExpandReplacement:");
print.it(x.Replace(s, o => { if(o.Length > 5) return o.ExpandReplacement("'$2$1'"); else return o[1].Value; }));

print.it("//Split:");
print.it(new regexp(@" *, *").Split(s));

Examples with String extension methods.

var s = "one two22, three333,four"; //subject string
var rx = @"\b(\w+?)(\d+)\b"; //regular expression

print.it("//RxIsMatch:");
print.it(s.RxIsMatch(rx));

print.it("//RxMatch:");
if(s.RxMatch(rx, out var m)) print.it(m.Value, m[1].Value, m[2].Value);

print.it("//RxMatch, get only string:");
if(s.RxMatch(rx, 0, out var s0)) print.it(s0);
print.it("//RxMatch, get only string of group 1:");
if(s.RxMatch(rx, 1, out var s1)) print.it(s1);

print.it("//RxFindAll with foreach:");
foreach(var v in s.RxFindAll(rx)) print.it(v.Value, v[1].Value, v[2].Value);

print.it("//RxFindAll with foreach, get only strings:");
foreach(var v in s.RxFindAll(rx, 0)) print.it(v);
print.it("//RxFindAll with foreach, get only strings of group 2:");
foreach(var v in s.RxFindAll(rx, 2)) print.it(v);

print.it("//RxFindAll, get array:");
if(s.RxFindAll(rx, out var am)) foreach(var k in am) print.it(k.Value, k[1].Value, k[2].Value);

print.it("//RxFindAll, get array of strings:");
if(s.RxFindAll(rx, 0, out var av)) print.it(av);
print.it("//RxFindAll, get array of group 2 strings:");
if(s.RxFindAll(rx, 2, out var ag)) print.it(ag);

print.it("//RxReplace:");
print.it(s.RxReplace(rx, "'$2$1'"));

print.it("//RxReplace with callback:");
print.it(s.RxReplace(rx, o => o.Value.Upper()));
print.it("//RxReplace with callback and ExpandReplacement:");
print.it(s.RxReplace(rx, o => { if(o.Length > 5) return o.ExpandReplacement("'$2$1'"); else return o[1].Value; }));

print.it("//RxReplace, get replacement count:");
if(0 != s.RxReplace(rx, "'$2$1'", out var s2)) print.it(s2);

print.it("//RxReplace with callback, get replacement count:");
if(0 != s.RxReplace(rx, o => o.Value.Upper(), out var s3)) print.it(s3);

print.it("//RxSplit:");
print.it(s.RxSplit(@" *, *"));

Namespace: Au
Assembly: Au.dll
Inheritance
object
regexp

Constructors

Name Description
regexp(string, RXFlags)

Compiles regular expression string.

Properties

Name Description
Callout

Sets callout callback function.

Methods

Name Description
FindAll(string, out RXMatch[], Range?, RXMatchFlags)

Finds all match instances of the regular expression. Gets array of RXMatch.

FindAll(string, int, Range?, RXMatchFlags)

Finds all match instances of the regular expression.

FindAll(string, int, out string[], Range?, RXMatchFlags)

Finds all match instances of the regular expression. Gets array of strings.

FindAll(string, Range?, RXMatchFlags)

Finds all match instances of the regular expression.

FindAllG(string, int, out RXGroup[], Range?, RXMatchFlags)

Finds all match instances of the regular expression. Gets array of RXGroup (index, length, value).

FindAllG(string, int, Range?, RXMatchFlags)

Finds all match instances of the regular expression.

GetGroupNumberOf(string)

Finds a named group and returns its 1-based index. Returns -1 if not found.

GetMaxGroupNumber()

Returns the highest capture group number in the regular expression. If (?| not used, this is also the total count of capture groups.

IsMatch(ReadOnlySpan<char>, Range?, RXMatchFlags)

Returns true if string s matches this regular expression.

IsMatch(string, Range?, RXMatchFlags)

Returns true if string s matches this regular expression.

Match(ReadOnlySpan<char>, int, out StartEnd, Range?, RXMatchFlags)

Returns true if string span s matches this regular expression. Gets whole match or some group, as StartEnd.

Match(ReadOnlySpan<char>, Span<StartEnd>, Range?, RXMatchFlags)

Returns true if string span s matches this regular expression. Writes match info to caller-allocated memory (array, stackalloc array, etc).

Match(string, out RXMatch, Range?, RXMatchFlags)

Returns true if string s matches this regular expression. Gets match info as RXMatch.

Match(string, int, out RXGroup, Range?, RXMatchFlags)

Returns true if string s matches this regular expression. Gets whole match or some group, as RXGroup (index, length, value).

Match(string, int, out string, Range?, RXMatchFlags)

Returns true if string s matches this regular expression. Gets whole match or some group, as string.

Replace(string, Func<RXMatch, string>, int, Range?, RXMatchFlags)

Finds and replaces all match instances of the regular expression. Uses a callback function.

Replace(string, Func<RXMatch, string>, out string, int, Range?, RXMatchFlags)

Finds and replaces all match instances of the regular expression. Uses a callback function.

Replace(string, string, int, Range?, RXMatchFlags)

Finds and replaces all match instances of the regular expression.

Replace(string, string, out string, int, Range?, RXMatchFlags)

Finds and replaces all match instances of the regular expression.

Split(string, int, Range?, RXMatchFlags)

Returns an array of substrings that in the subject string are delimited by regular expression matches.

SplitG(string, int, Range?, RXMatchFlags)

Returns RXGroup array of substrings delimited by regular expression matches.

addReplaceFunc(string, Func<RXMatch, int, string, string>)

Adds or replaces a function that is called when a regular expression replacement string contains ${+name} or ${+name(g)} or ${+name(g, v)}, where g is group number or name and v is any string.

escapeQE(string, bool)

Encloses string in \Q\E if it contains metacharacters \^$.[|()?*+{ or if always == true.