1. Euphoria Regex suggestions

I was getting ready to port my personal wrapper for pcre that I've been using for years to use the internal pcre, but I see that some critical functions have been left out of the Eu version that seriously limit the potential of using the built-in regex in Euphoria. The big one is the lack of support for named subpatterns. Without the ability to access a pattern by name, it pretty much means any complex regex patterns cannot be used, sanely anyways. Because with many and nested patterns, and the fact that every time you add a pattern or take one away they are all renumbered, working with matched patterns by number is a serious handicap and highly bug-prone. (Basically not worth it except for small and simple patterns, but a complex regex pattern can be quite huge with conditional branches, etc. Names are absolutely essential for these.) To gain all this power only one other function needs to be included from pcre: pcre_get_stringnumber (converts pattern name to pattern number) And there are some other tricks/more power that can be achieved with access to the information pcre provides with its internal function pcre_fullinfo. By adding those two functions to the eu wrapper the usefulness of the regex library could be increased tenfold. Also helpful would be some other minor tinkering to go along with it to allow more flexible extractions of matches instead of always returning all of them, but that could be done in the eu API as long as you could match names to numbers and have access to the full map that pcre_fullinfo provides (in the case of using duplicate pattern names, you need this to find them all).

The API would need some beefing-up as well, something I could do, at least on the Euphoria side. But I see some API functions are internal (find_replace) which would be beyond my capability. Anyway, I'd like to make a STRONG suggestion that this expanded support be included in a future version as the power of the underlying library is being wasted...

new topic     » topic index » view message » categorize

2. Re: Euphoria Regex suggestions

AndySerpa said...

I see that some critical functions have been left out of the Eu version that seriously limit the potential of using the built-in regex in Euphoria. The big one is the lack of support for named subpatterns. To gain all this power only one other function needs to be included from pcre: pcre_get_stringnumber (converts pattern name to pattern number) And there are some other tricks/more power that can be achieved with access to the information pcre provides with its internal function pcre_fullinfo. By adding those two functions to the eu wrapper the usefulness of the regex library could be increased tenfold.

I'll open a feature request for this and try to get it in by 4.0.1

AndySerpa said...

Also helpful would be some other minor tinkering to go along with it to allow more flexible extractions of matches instead of always returning all of them, but that could be done in the eu API as long as you could match names to numbers and have access to the full map that pcre_fullinfo provides (in the case of using duplicate pattern names, you need this to find them all).

The API would need some beefing-up as well, something I could do, at least on the Euphoria side. But I see some API functions are internal (find_replace) which would be beyond my capability. Anyway, I'd like to make a STRONG suggestion that this expanded support be included in a future version as the power of the underlying library is being wasted...

If you can do it in Euphoria and share the code, I can do my best to get that code into std/regex.e

What sort of changes are you suggesting for find_replace() ?

new topic     » goto parent     » topic index » view message » categorize

3. Re: Euphoria Regex suggestions

jimcbrown said...
AndySerpa said...

I see that some critical functions have been left out of the Eu version that seriously limit the potential of using the built-in regex in Euphoria. The big one is the lack of support for named subpatterns. To gain all this power only one other function needs to be included from pcre: pcre_get_stringnumber (converts pattern name to pattern number) And there are some other tricks/more power that can be achieved with access to the information pcre provides with its internal function pcre_fullinfo. By adding those two functions to the eu wrapper the usefulness of the regex library could be increased tenfold.

I'll open a feature request for this and try to get it in by 4.0.1

This is ticket:606

new topic     » goto parent     » topic index » view message » categorize

4. Re: Euphoria Regex suggestions

jimcbrown said...

What sort of changes are you suggesting for find_replace() ?

I'm not (necessarily), but I notice that it was done internally (pcre has no built-in replace function that I know of, so someone must have written one) so if the eu development team would prefer to have (or it would just be better to have) some of this API stuff done internally, I can suggest the functionality, but couldn't be helpful with the code. In other words, I can write Euphoria, but not C. But in general, even if you do put it in internally, as long as the stringnumber function is exposed and key parts of the fullinfo function, it would make possible any API (in Euphoria) that anyone would care to make, whereas now those things just aren't accessible so its impossible. I'll develop these thoughts more fully and provide some examples of what I'm talking about. (Should I join the dev forum and talk about it there?)

new topic     » goto parent     » topic index » view message » categorize

5. Re: Euphoria Regex suggestions

AndySerpa said...
jimcbrown said...

What sort of changes are you suggesting for find_replace() ?

I'm not (necessarily), but I notice that it was done internally (pcre has no built-in replace function that I know of, so someone must have written one) so if the eu development team would prefer to have (or it would just be better to have) some of this API stuff done internally, I can suggest the functionality, but couldn't be helpful with the code. In other words, I can write Euphoria, but not C. But in general, even if you do put it in internally, as long as the stringnumber function is exposed and key parts of the fullinfo function, it would make possible any API (in Euphoria) that anyone would care to make, whereas now those things just aren't accessible so its impossible. I'll develop these thoughts more fully and provide some examples of what I'm talking about.

The main reason for find_replace() to be C seems to be because the PCRE C interface uses malloc'd memory to pass data back and forth, so it was just easier to manipulate those structures in C code and then stuff it into a sequence at the end.

I'm open to the idea of moving it into Euphoria code (and doing the stuff using allocate() and peek() and poke()) if that expands the pool of people who can work on it. (If you already have your own euphoric implementation of a find_replace()-like function from your own library, that'd be even better.)

AndySerpa said...

(Should I join the dev forum and talk about it there?)

You are quite welcome to.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu