1. clarification needed: regex new()
- Posted by _tom (admin) Oct 04, 2010
- 1173 views
Here is a code sample that I cannot explain:
include std/regex.e as re include std/console.e object x display("This is not expected to work" ) x = re:find( `World`, "Hello World!" ) ? x display( error_to_string( x ) ) -- ERROR_NULL display("\n This is expected to work" ) regex r=re:new( `World` ) x = re:find( r, "Hello World!" ) ? x ---- -- given a preceding re:new(`World`) -- this now works display( "\n Why does this work?" ) x = re:find( `World`, "Hello World!" ) display( x ) -- { -- {7,11} -- }
So, what is going on in the internals of PCRE?
2. Re: clarification needed: regex new()
- Posted by useless Oct 04, 2010
- 1218 views
Re:
regex r=re:new( `World` )
, why not
regex r=re:new( "World" )
? It's a sequence, no?
useless
3. Re: clarification needed: regex new()
- Posted by mattlewis (admin) Oct 04, 2010
- 1143 views
Here is a code sample that I cannot explain: [snip]
So, what is going on in the internals of PCRE?
This is due to the way regular expressions are implemented. We've added a bit of metadata to doubles and sequences. The metadata can serve several purposes, but one of them is to deal with the extra data for a regular expression. PCRE allocates some memory for its own use.
I think that what's happening is that the regex data is being attached to the literal `World` object, rather than creating a copy, as it probably should. So when you pass `World` after it was passed to regex:new(), it's still carrying the PCRE information. Before that call, add:
delete( `World` )
...and the final call will stop working, because that PCRE data has been freed. In reality, we should probably always return a new copy of the sequence when you call regex:new().
Matt
4. Re: clarification needed: regex new()
- Posted by jeremy (admin) Oct 05, 2010
- 1120 views
Re:
regex r=re:new( `World` )
, why not
regex r=re:new( "World" )
? It's a sequence, no?
"World" and `World` are the same in 4.x. `...` is a new string delimiter. It is different in that escape characters are not evaluated in a `...` string, thus nice to use for regular expressions. For example:
sequence seq1 = "Hello\nWorld" sequence seq2 = `Hello\nWorld` printf(1, "seq1=%s\nseq2=%s\n", { seq1, seq2 }) -- Output: -- seq1=Hello -- World -- seq2=Hello\nWorld
Where this comes in handy with regular expressions is when you use character classes, modifiers, etc... For example:
-- String to match: 48594: A - This is a description of ticket 48594 sequence reg1 = `\d+:[A-C]\s+-\s+\b[\w\s]+` -- Written using "" it would have to look like: sequence reg2 = "\\d+:[A-Z]\\s+-\\s+\\b[\\w\\s]+"
There are of course many other uses, but that's why you'll see it used in regular expressions even when the regular expression might not (yet) require special modifiers. Just easier to change and use them in the future.
Jeremy
5. Re: clarification needed: regex new()
- Posted by useless Oct 05, 2010
- 1068 views
So i can do:
sequence bloo = 'This is a sentence across three lines' ? bloo This is a sentence across three lines
?
useless
6. Re: clarification needed: regex new()
- Posted by mattlewis (admin) Oct 05, 2010
- 1097 views
So i can do:
sequence bloo = 'This is a sentence across three lines' ? bloo This is a sentence across three lines
?
Not quite. You used a regular single quote. In order to do that, you need to use the back tick. On a US keyboard, at least, it's up next to the 1 key and shares the key with the tilde.
sequence bloo = `This is a sentence across three lines` puts(1, bloo ) This is a sentence across three lines
Matt