1. regex observation & question regarding re:escape()

Windows 7 pro sp1 euphoria: 4.0.5 (362497032f33, 2012-10-11)

I guess I am missing something here. I don't see what it is.

From the manual ==> 8.20.6.1 escape()

 Escape special regular expression characters that may be entered into a search string from user input. 
Notes: 
Special regex characters are:  
 
. \ + * ? [ ^ ] $ ( ) { } = ! < > | : -   
 
Example 1: 
 
include std/regex.e as re 
sequence search_s = re:escape("Payroll is $***15.00") 
-- search_s = "Payroll is \\$\\*\\*\\*15\\.00" 


What is actually produced by the above eucode is:

"Payroll is \$\*\*\*15\.00" which will not compile as a eu regex

another special character (from above list) is the backslash.
escape() cannot deal with it at all giving an unknown escaped character error..
So, use the alternative entry method offered in the docs: use #/.../ instead of "..." as an example: regex will not compile the following:
regex p = re:new(#/\*/)
from the docs:
The alternative writing style lets you write a regex without doubling up on the escape characters: 
 
regex sample = re:new( #/\*/ ) 

Giving the error message: hex number not formed correctly.
Using a double escape does still work.
Regular expressions are hairy enough without all the extra backslashes.
In any event, the docs should agree with the performance!

regards, jd

new topic     » topic index » view message » categorize

2. Re: regex observation & question regarding re:escape()

jessedavis said...

Windows 7 pro sp1 euphoria: 4.0.5 (362497032f33, 2012-10-11)

I guess I am missing something here. I don't see what it is.

... Regular expressions are hairy enough without all the extra backslashes.
In any event, the docs should agree with the performance!

regards, jd

Sorry, can't help with "normal" regex. But I agree that it can be quite tricky. I also agree that \std is not always up to scratch. What I did in my own work was write a custom simplified regex parser/matcher and changed the escape char to be a forward slash. In my system your expr would then look like this:

sequence p1 = re:Parse( "Payroll_is_ /$ /*/*/* 15.00") 

Which is fine if the programmer is the one entering the regex [and can remember how it is supposed to go]. Hmmm, that expr doesn't look like it'd do much. Is this just an example to show the issue or is it, in fact, typical? I might be wrong but it looks to me as if you'd almost be better off doing a straight text match:

integer try = match( "Payroll is $***15.00", s) 

But if you were looking for payroll totals then..

sequence p1 = re:Parse( "Payroll_is_ /$ [*]{0,4} /d+ . /d{2}") -- Alas, non-standard regex.. 

Spock

new topic     » goto parent     » topic index » view message » categorize

3. Re: regex observation & question regarding re:escape()

Spock said...

Hmmm, that expr doesn't look like it'd do much. Is this just an example to show the issue or is it, in fact, typical? I might be wrong but it looks to me as if you'd almost be better off doing a straight text match:

Spock

The expression doesn't do much, true. These two regex expressions were taken from the docs directly as they were examples of how to do it. I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor. I am building a small single line parser. I thought regex would be easier to implement since it recognizes patterns whereas the built in search routines recognize characters.
In any event, thanks for your help.

Regards,
jd

new topic     » goto parent     » topic index » view message » categorize

4. Re: regex observation & question regarding re:escape()

jessedavis said...

I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.

When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything (so long as your strings don't have actual backticks in them).

http://openeuphoria.org/docs/lang_def.html#_86_characterstringsandindividualcharacters

Matt

new topic     » goto parent     » topic index » view message » categorize

5. Re: regex observation & question regarding re:escape()

jessedavis said...

In any event, the docs should agree with the performance!

regards, jd

My fault.

I can't find out how I got #/\*/ to work. It must have been some weird stuff I had on my computer.

I did have time to retry example 1

    include std/console.e 
 
include std/regex.e as re 
 
sequence s0 = "Payroll is $***15.00" 
display( s0 ) 
    --> Payroll is $***15.00 
    -- this is a valid Euphoria string 
    -- this IS NOT a valid regex string 
 
-- when s0 is used with the regex language 
    --> $ is a keyword 
    --> * is a keyword 
    --> . is a keyword 
 
    -- you must escape these characters before using them in a regex 
    -- then, you must escape the \ when writing a string 
 
sequence s1 = "Payroll is \\$\\*\\*\\*15\\.00" 
display( s1 ) 
    --> Payroll is \$\*\*\*15\.00 
    -- the form required when writing a regex 
 
 
-- writing a string with a ` backtik is a bit shorter 
 
sequence s2 = `Payroll is \$\*\*\*15\.00` 
display( s2) 
    --> Payroll is \$\*\*\*15\.00 
    -- the form required when writing a regex 
 
-- the Euphoria re:escape function adds the required backslashes 
-- to a sequence written with " delimiters 
sequence s3 = re:escape( s0 ) 
display( s3 ) 
    --> Payroll is \$\*\*\*15\.00 
    -- the form required when writing a regex 
 
--Note: 
-- the strings re:escape(s0), s1, and s2 are only useful as literal patterns 

thanks for pointing out my mistakes

_tom

new topic     » goto parent     » topic index » view message » categorize

6. Re: regex observation & question regarding re:escape()

mattlewis said...

When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything (so long as your strings don't have actual backticks in them).

Matt


Thanks Matt, works like a charm. I guess I've just discovered the perils of a one track mind!

regards, jd

new topic     » goto parent     » topic index » view message » categorize

7. Re: regex observation & question regarding re:escape()

mattlewis said...
jessedavis said...

I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.

When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...

Matt

Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.

Spock

new topic     » goto parent     » topic index » view message » categorize

8. Re: regex observation & question regarding re:escape()

Spock said...
mattlewis said...
jessedavis said...

I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.

When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...

Matt

Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.

Yes, sorry. I meant that you don't have to escape any characters for the purposes of euphoria reading in the text. So:

"foo\\.txt" 
 
-- could be written as: 
 
`foo\.txt` 

You need to escape the period because that's a special character in regular expressions. But you don't need to escape the backslash itself.

Matt

new topic     » goto parent     » topic index » view message » categorize

9. Re: regex observation & question regarding re:escape()

Thanks tom for taking the time to demonstrate the proper usage. I appreciate your efforts. When I read the docs I took away an erroneous expectation about both escape() and the #/.../ form. I had completely forgotten the backtik form. Ya live & learn; and then, some of us just live!
I'm a long time user of euphoria - since 2.something. Now that I have retired from engineering I enjoy pushing ever deep into euphoria. You guys do a great job.

Thanks again,
jd

new topic     » goto parent     » topic index » view message » categorize

10. Re: regex observation & question regarding re:escape()

mattlewis said...

When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...

Matt

Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.

Yes, sorry. I meant that you don't have to escape any characters for the purposes of euphoria reading in the text. So:

"foo\\.txt" 
 
-- could be written as: 
 
`foo\.txt` 

You need to escape the period because that's a special character in regular expressions. But you don't need to escape the backslash itself.

Matt


Got it! Thanks for all your help. I really appreciate it!
Regards,
jd

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu