1. regex observation & question regarding re:escape()
- Posted by jessedavis Sep 18, 2016
- 1621 views
Windows 7 pro sp1 euphoria: 4.0.5 (362497032f33, 2012-10-11)
I guess I am missing something here. I don't see what it is.
From the manual ==> 8.20.6.1 escape()
Escape special regular expression characters that may be entered into a search string from user input. Notes: Special regex characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : - Example 1: include std/regex.e as re sequence search_s = re:escape("Payroll is $***15.00") -- search_s = "Payroll is \\$\\*\\*\\*15\\.00"
What is actually produced by the above eucode is:
"Payroll is \$\*\*\*15\.00" which will not compile as a eu regex
another special character (from above list) is the backslash.
escape() cannot deal with it at all giving an unknown escaped character error..
So, use the alternative entry method offered in the docs: use #/.../ instead of "..." as an example: regex will not compile the following:
regex p = re:new(#/\*/)
from the docs:
The alternative writing style lets you write a regex without doubling up on the escape characters: regex sample = re:new( #/\*/ )
Giving the error message: hex number not formed correctly.
Using a double escape does still work.
Regular expressions are hairy enough without all the extra backslashes.
In any event, the docs should agree with the performance!
regards, jd
2. Re: regex observation & question regarding re:escape()
- Posted by Spock Sep 18, 2016
- 1622 views
- Last edited Sep 20, 2016
Windows 7 pro sp1 euphoria: 4.0.5 (362497032f33, 2012-10-11)
I guess I am missing something here. I don't see what it is.
... Regular expressions are hairy enough without all the extra backslashes.
In any event, the docs should agree with the performance!
regards, jd
Sorry, can't help with "normal" regex. But I agree that it can be quite tricky. I also agree that \std is not always up to scratch. What I did in my own work was write a custom simplified regex parser/matcher and changed the escape char to be a forward slash. In my system your expr would then look like this:
sequence p1 = re:Parse( "Payroll_is_ /$ /*/*/* 15.00")
Which is fine if the programmer is the one entering the regex [and can remember how it is supposed to go]. Hmmm, that expr doesn't look like it'd do much. Is this just an example to show the issue or is it, in fact, typical? I might be wrong but it looks to me as if you'd almost be better off doing a straight text match:
integer try = match( "Payroll is $***15.00", s)
But if you were looking for payroll totals then..
sequence p1 = re:Parse( "Payroll_is_ /$ [*]{0,4} /d+ . /d{2}") -- Alas, non-standard regex..
Spock
3. Re: regex observation & question regarding re:escape()
- Posted by jessedavis Sep 19, 2016
- 1579 views
Hmmm, that expr doesn't look like it'd do much. Is this just an example to show the issue or is it, in fact, typical? I might be wrong but it looks to me as if you'd almost be better off doing a straight text match:
Spock
The expression doesn't do much, true. These two regex expressions were taken from the docs directly as they were examples of how to do it. I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor. I am building a small single line parser. I thought regex would be easier to implement since it recognizes patterns whereas the built in search routines recognize characters.
In any event, thanks for your help.
Regards,
jd
4. Re: regex observation & question regarding re:escape()
- Posted by mattlewis (admin) Sep 19, 2016
- 1561 views
I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.
When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything (so long as your strings don't have actual backticks in them).
http://openeuphoria.org/docs/lang_def.html#_86_characterstringsandindividualcharacters
Matt
5. Re: regex observation & question regarding re:escape()
- Posted by _tom (admin) Sep 19, 2016
- 1543 views
In any event, the docs should agree with the performance!
regards, jd
My fault.
I can't find out how I got #/\*/ to work. It must have been some weird stuff I had on my computer.
I did have time to retry example 1
include std/console.e include std/regex.e as re sequence s0 = "Payroll is $***15.00" display( s0 ) --> Payroll is $***15.00 -- this is a valid Euphoria string -- this IS NOT a valid regex string -- when s0 is used with the regex language --> $ is a keyword --> * is a keyword --> . is a keyword -- you must escape these characters before using them in a regex -- then, you must escape the \ when writing a string sequence s1 = "Payroll is \\$\\*\\*\\*15\\.00" display( s1 ) --> Payroll is \$\*\*\*15\.00 -- the form required when writing a regex -- writing a string with a ` backtik is a bit shorter sequence s2 = `Payroll is \$\*\*\*15\.00` display( s2) --> Payroll is \$\*\*\*15\.00 -- the form required when writing a regex -- the Euphoria re:escape function adds the required backslashes -- to a sequence written with " delimiters sequence s3 = re:escape( s0 ) display( s3 ) --> Payroll is \$\*\*\*15\.00 -- the form required when writing a regex --Note: -- the strings re:escape(s0), s1, and s2 are only useful as literal patterns
thanks for pointing out my mistakes
_tom
6. Re: regex observation & question regarding re:escape()
- Posted by jessedavis Sep 20, 2016
- 1552 views
When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything (so long as your strings don't have actual backticks in them).
Matt
Thanks Matt, works like a charm. I guess I've just discovered the perils of a one track mind!
regards, jd
7. Re: regex observation & question regarding re:escape()
- Posted by Spock Sep 20, 2016
- 1556 views
I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.
When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...
Matt
Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.
Spock
8. Re: regex observation & question regarding re:escape()
- Posted by mattlewis (admin) Sep 21, 2016
- 1537 views
I was particularly interested in the second entry format that uses the #/.../ format because it has the potential to ease the confusion factor.
When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...
Matt
Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.
Yes, sorry. I meant that you don't have to escape any characters for the purposes of euphoria reading in the text. So:
"foo\\.txt" -- could be written as: `foo\.txt`
You need to escape the period because that's a special character in regular expressions. But you don't need to escape the backslash itself.
Matt
9. Re: regex observation & question regarding re:escape()
- Posted by jessedavis Sep 22, 2016
- 1445 views
Thanks tom for taking the time to demonstrate the proper usage. I appreciate your efforts. When I read the docs I took away an erroneous expectation about both escape() and the #/.../ form. I had completely forgotten the backtik form. Ya live & learn; and then, some of us just live!
I'm a long time user of euphoria - since 2.something. Now that I have retired from engineering I enjoy pushing ever deep into euphoria. You guys do a great job.
Thanks again,
jd
10. Re: regex observation & question regarding re:escape()
- Posted by jessedavis Sep 22, 2016
- 1489 views
When using regular expressions, I'd recommend using the ` ` backtick style quotes because you don't have to escape anything ...
Matt
Just to clarify - 'anything' is referring to characters in a text string. Special characters treated as literals in the regex itself must still be escaped.
Yes, sorry. I meant that you don't have to escape any characters for the purposes of euphoria reading in the text. So:
"foo\\.txt" -- could be written as: `foo\.txt`
You need to escape the period because that's a special character in regular expressions. But you don't need to escape the backslash itself.
Matt
Got it! Thanks for all your help. I really appreciate it!
Regards,
jd