4.0a3 - Two Regular Expression Libraries? -- We need your input!

new topic     » topic index » view thread      » older message » newer message

This is a long story in which I'll attempt to make as short as possible.

We originally started with embedding PCRE into Euphoria. This caused quite a bit of problems with our build system, of which we got past most. On OS X, however, to configure PCRE, it required updated autoconf and automake tools, which then caused problems with other OS X source packages being installed. I maintain another project, eFTE, which is a text editor and it has built in regular expressions. It runs on a host of platforms (Windows, Linux, OS X, OS/2, AIX, Solaris, FreeBSD, IRIX, HP/UX, SunOS and NCR that I know of) w/o any configuration at all (all standard C). I ran a few tests that showed the regular expression code from eFTE was faster than PCRE until the haystack got to be around 200k, then they were pretty equal in speed. After some debating, we decided to dump PCRE in favor of the simple drop in .c/.h file from eFTE to do our regular expressions. Things became much simpler. This was quite some time ago.

Since then, CoJaBo (on IRC, irc://irc.freenode.net#euphoria ), pounded the regex library with all sorts of valid and invalid expressions, during which, he found several that caused endless loops and segmentation faults. We, of course, tested the eFTE regex with all sorts of valid expressions and once we proved it worked, we continued on with development, which is the normal cycle. Once beta hits, tests will be greatly expanded as we hunt for bugs high and low. In this case, CoJaBo beat us to it.

So, we have fixed the problems that CoJaBo has found in eFTE's regex system, but during this phase, some were quite difficult to track down and I decided to give PCRE another look. As I read about how other tools delt with embedding PCRE, I learned of a better method, than relying on PCRE's configure system. I copied example config.h/pcre.h files, configured them by hand for each platform (not as hard as it sounds) and then created config.h.windows/pcre.h.windows, config.h.linux/pcre.h.linux, etc... Now our build system copies in the correct file and we do not rely no PCRE's config system, thus, things are easy again.


So, for 4.0a3, we released both regular expression libraries and would like your input (devs too please). Here are the pro's/con's that I know of:

In favor of PCRE:

  • Very wide acceptance, just about the standard for regular expressions.
  • Well tested (used in hundreds of products).
  • Produces helpful error messages when you enter an invalid expression.

Against PCRE:

  • Complex code that the Euphoria devs are not going to touch/modify, even in the case of a problem. We will have to wait for an official fix from the PCRE group.
  • Very large code base, it adds 171k to the binary.
  • Targeted by some hackers for finding exploits, when they do, they have access to hundreds of products.

In favor of eFTE's REGEX:

  • Very portable code.
  • Very small, adds only 15k to the binary.
  • Only 1,200 lines of code that Euphoria devs can maintain directly.
  • Not targeted by hackers to exploit.

Against eFTE's REGEX:

  • Does not support as many expression syntaxes as PCRE, but gets the majority.
  • Not as well tested as PCRE.
  • Does not produce nice error messages on an invalid syntax expression, simply returns "Invalid Expression."

I personally do not care which library we go with. With eFTE, I am going to follow suite with what Euphoria decides, i.e. if we decide on PCRE, I'm going to convert eFTE to use PCRE, if we keep eFTE's REGEXP .c/.h file, then, I'll merge changes made by Euphoria devs into eFTE and changes made by eFTE devs into Euphoria. I use to be pretty heavily weighted to eFTE's Regexp due to the simplicity of the code and the ability for us to maintain it directly, however, now I am riding the fence post and do not really care either way. The both have pros/cons and I am unsure of which is the best course to take.

Jeremy

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu