RegEx engines

Regular Expressions work behind the scenes using RegEx engines, there are mainly two types of RegEx engines text-directed(DNF) and regex-directed(NFA) engine.

There is an easy way to find which RegEx engine is your application using, using put the text “hello world” in your application and search for “hello|hello world” . If your application returns hello then your application is using a regex directed engine, if your application returns “hello world” as the result then it is using a text-directed engine.

regex-directed engine is always eager and will return the left most result even if a better match can be found later

When applying «cat» to “He captured a catfish for his cat.”, the engine will try to match the first token in the regex «c» to the first character in the match “H”. This fails. There are no other possible permutations of this regex, because it merely consists of a sequence of literal characters. So the regex engine tries to match the «c» with the “e”. This fails too, as does matching the «c» with the space. Arriving at the 4th character in the match, «c» matches „c”. The engine will then try to match the second token «a» to the 5th character, „a”. This succeeds too. But then, «t» fails to match “p”. At that point, the engine knows the regex cannot be matched starting at the 4th character in the match. So it will continue with the 5th: “a”. Again, «c» fails to match here and the engine carries on. At the 15th character in the match, «c» again matches „c”. The engine then proceeds to attempt to match the remainder of the regex at character 15 and finds that «a» matches „a” and «t» matches „t”.

The entire regular expression could be matched starting at character 15. The engine is “eager” to report a match. It will therefore report the first three letters of catfish as a valid match. The engine never proceeds beyond this point to see if there are any “better” matches. The first match is considered good enough.This is how a regex directed engine works.

Leave a Comment