1. clarification needed: regex new()

Here is a code sample that I cannot explain:

 
include std/regex.e as re 
include std/console.e 
 
object x 
 
display("This is not expected to work" ) 
x = re:find( `World`, "Hello World!" ) 
? x 
display(  error_to_string( x ) ) 
-- ERROR_NULL 
 
display("\n This is expected to work" ) 
regex r=re:new( `World` ) 
x = re:find( r, "Hello World!" ) 
? x 
 
---- 
-- given a preceding re:new(`World`) 
-- this now works 
display( "\n Why does this work?" ) 
x = re:find( `World`, "Hello World!" ) 
display( x ) 
-- { 
--     {7,11} 
-- } 

So, what is going on in the internals of PCRE?

new topic     » topic index » view message » categorize

2. Re: clarification needed: regex new()


Re:

regex r=re:new( `World` ) 


, why not

regex r=re:new( "World" ) 


? It's a sequence, no?

useless

new topic     » goto parent     » topic index » view message » categorize

3. Re: clarification needed: regex new()

_tom said...

Here is a code sample that I cannot explain: [snip]

So, what is going on in the internals of PCRE?

This is due to the way regular expressions are implemented. We've added a bit of metadata to doubles and sequences. The metadata can serve several purposes, but one of them is to deal with the extra data for a regular expression. PCRE allocates some memory for its own use.

I think that what's happening is that the regex data is being attached to the literal `World` object, rather than creating a copy, as it probably should. So when you pass `World` after it was passed to regex:new(), it's still carrying the PCRE information. Before that call, add:

delete( `World` ) 

...and the final call will stop working, because that PCRE data has been freed. In reality, we should probably always return a new copy of the sequence when you call regex:new().

Matt

new topic     » goto parent     » topic index » view message » categorize

4. Re: clarification needed: regex new()

useless said...


Re:

regex r=re:new( `World` ) 


, why not

regex r=re:new( "World" ) 


? It's a sequence, no?

"World" and `World` are the same in 4.x. `...` is a new string delimiter. It is different in that escape characters are not evaluated in a `...` string, thus nice to use for regular expressions. For example:

sequence seq1 = "Hello\nWorld" 
sequence seq2 = `Hello\nWorld` 
 
printf(1, "seq1=%s\nseq2=%s\n", { seq1, seq2 }) 
 
-- Output: 
-- seq1=Hello 
-- World 
-- seq2=Hello\nWorld 

Where this comes in handy with regular expressions is when you use character classes, modifiers, etc... For example:

-- String to match: 48594: A - This is a description of ticket 48594 
 
sequence reg1 = `\d+:[A-C]\s+-\s+\b[\w\s]+`  
 
-- Written using "" it would have to look like: 
 
sequence reg2 = "\\d+:[A-Z]\\s+-\\s+\\b[\\w\\s]+" 

There are of course many other uses, but that's why you'll see it used in regular expressions even when the regular expression might not (yet) require special modifiers. Just easier to change and use them in the future.

Jeremy

new topic     » goto parent     » topic index » view message » categorize

5. Re: clarification needed: regex new()


So i can do:

sequence bloo = 'This is 
a sentence across 
three lines' 
 
? bloo 
This is 
a sentence across 
three lines 


?
useless

new topic     » goto parent     » topic index » view message » categorize

6. Re: clarification needed: regex new()

useless said...


So i can do:

sequence bloo = 'This is 
a sentence across 
three lines' 
 
? bloo 
This is 
a sentence across 
three lines 

?

Not quite. You used a regular single quote. In order to do that, you need to use the back tick. On a US keyboard, at least, it's up next to the 1 key and shares the key with the tilde.

 
sequence bloo = `This is 

a sentence across 
three lines` 

 
puts(1,  bloo ) 
This is 
a sentence across 
three lines 

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu