1. searching and regex miniguide -- now new wiki page
- Posted by _tom (admin) Oct 05, 2010
- 1637 views
My progress so far in creating a miniguide for searching and regex is http://openeuphoria.org/wiki/euwiki.ex?Searching.
This is the first draft. Comments and wisdom are welcome.
2. Re: searching and regex miniguide -- now new wiki page
- Posted by DanM Oct 05, 2010
- 1582 views
My progress so far in creating a miniguide for searching and regex is http://openeuphoria.org/wiki/euwiki.ex?Searching.
This is the first draft. Comments and wisdom are welcome.
I think it looks VERY nice, but I don't know much at all about regex, so maybe there's things there not so good that I miss. I do wonder what:
meaning regex wildcard
means, though. Its in a section about "?" not being the same in regex and in a wildcard.
But the above didn't show up in preview in the same way as I saw it in the wiki nor how it showed as I pasted it in this post; in this post, it copied as a mix of double ampersands separated by spaces, and in the wiki it presented as what I take to be a different kind or font for ampersands also separated by spaces. (Its the AMPERSANDS I'm wondering about.)
Dan M.
3. Re: searching and regex miniguide -- now new wiki page
- Posted by mattlewis (admin) Oct 05, 2010
- 1659 views
My progress so far in creating a miniguide for searching and regex is http://openeuphoria.org/wiki/euwiki.ex?Searching.
This is the first draft. Comments and wisdom are welcome.
I think it looks VERY nice, but I don't know much at all about regex, so maybe there's things there not so good that I miss. I do wonder what:
meaning regex wildcard
means, though. Its in a section about "?" not being the same in regex and in a wildcard.
But the above didn't show up in preview in the same way as I saw it in the wiki nor how it showed as I pasted it in this post; in this post, it copied as a mix of double ampersands separated by spaces, and in the wiki it presented as what I take to be a different kind or font for ampersands also separated by spaces.
Regular expressions are more complex than simple wildcards. Basically, it is possible to define much more complex patterns than simple wildcards would allow. It's also possible to designate parts of the match for use later. Basically, to find a certain part of the string that you're interested in.
As to your question (I think the markup was a mistake), in a regular expression, a dot (.) matches any character. It's similar to what a question mark does with 'normal' wildcards. With a regex, however, it is possible to use a quantifier, to say how many characters it should match. A question mark indicates that the preceding character is optional for the match to be good. An asterisk after a character means 0 or many. A plus sign means one or more. You can also use curly braces to specify an actual number. That scratches the surface, but hopefully helps.
Matt
4. Re: searching and regex miniguide -- now new wiki page
- Posted by DanM Oct 05, 2010
- 1623 views
My progress so far in creating a miniguide for searching and regex is http://openeuphoria.org/wiki/euwiki.ex?Searching.
This is the first draft. Comments and wisdom are welcome.
I think it looks VERY nice, but I don't know much at all about regex, so maybe there's things there not so good that I miss. I do wonder what:
meaning regex wildcard
means, though. Its in a section about "?" not being the same in regex and in a wildcard.
But the above didn't show up in preview in the same way as I saw it in the wiki nor how it showed as I pasted it in this post; in this post, it copied as a mix of double ampersands separated by spaces, and in the wiki it presented as what I take to be a different kind or font for ampersands also separated by spaces.
Regular expressions are more complex than simple wildcards. Basically, it is possible to define much more complex patterns than simple wildcards would allow. It's also possible to designate parts of the match for use later. Basically, to find a certain part of the string that you're interested in.
As to your question (I think the markup was a mistake), in a regular expression, a dot (.) matches any character. It's similar to what a question mark does with 'normal' wildcards. With a regex, however, it is possible to use a quantifier, to say how many characters it should match. A question mark indicates that the preceding character is optional for the match to be good. An asterisk after a character means 0 or many. A plus sign means one or more. You can also use curly braces to specify an actual number. That scratches the surface, but hopefully helps.
Matt
Thanks Matt,
I was beginning to understand the entry itself, past the confusing ampersands, but every little bit of info helps, so thanks for your additional explanation.
I figured the ampersands were some kind of error, that's why I pointed it out.
Dan
5. Re: searching and regex miniguide -- now new wiki page
- Posted by CoJaBo Oct 05, 2010
- 1590 views
I ran it thru spellcheck and 86'd the section comparing metacharacters, as it made no sense- they represent entirely different constructs in regex vs traditional wildcards.
Some sections still need cleaned up and checked for accuracy..
A good resource I've used before is http://www.regular-expressions.info/tutorial.html which gives a pretty comprehensive definition of regex syntax.
Unrelated, but- why isn't there a button to show changes when editing a wiki page? Also, the diff/revision history, etc links are broken on the edit and edit completed pages.
6. Re: searching and regex miniguide -- now new wiki page
- Posted by _tom (admin) Oct 06, 2010
- 1557 views
Thank for your help.
The && && && problem was due to an experiment in adjusting the column spacing of tables.
I am already thinking about how this miniguide should be turned into two guides: one for Euphoria based functions and one for regex functions. I am trying to put too much content into too small an article.
I have added the reference that CoJaBo found on regex to the top of the miniguide.
_tom
7. Re: searching and regex miniguide -- now new wiki page
- Posted by CoJaBo Oct 07, 2010
- 1501 views
Ive removed the wildcard/regex comparison table again because making it look nicer does not make it any less bogus. The metachacters ? and * simply do not mean the same thing in wildcards (where they match characters) and regex (where they specify repetition of a previous term). An apple is not comparable with a car tire just because both happen to be a vaguely round shape. The regex repetition metacharacters ?, +, *, and {} are sufficiently complex enough to warrant their own section, mentioning them in a section not even related to regex would be confusing even if it was correct. Regexes are confusing enough to most people without having a guide mislead them.
8. Re: searching and regex miniguide -- now new wiki page
- Posted by jimcbrown (admin) Oct 07, 2010
- 1453 views
The metachacters ? and * simply do not mean the same thing in wildcards (where they match characters) and regex (where they specify repetition of a previous term).
Agreed.
The regex repetition metacharacters ?, +, *, and {} are sufficiently complex enough to warrant their own section
Completely agree.
mentioning them in a section not even related to regex would be confusing even if it was correct.
Ive removed the wildcard/regex comparison table again because making it look nicer does not make it any less bogus.
I strongly disagree with this. The guide should explain the separate (and very difference) usages of both the wildcad metacharacters and regex metacharacters (or at least touch very briefly on this, and point out links to find out more). At this point, a guide contrasting regex and wildcards is quite useful, especially if one is very familiar with how to use wildcards but completely new to regexes.
A table showing how the same identical string can lead to different results, depending on if it's interpreted as a regex or as a wildcard match, make these differences far easier to grasp.
Additionally, a table showing how to perform the same actions using both wildcards and regex, side-by-side, would be quite handy as a quick reference.
The comparison table you deleted appears to fit with the above goals, and should be restored.
9. Re: searching and regex miniguide -- now new wiki page
- Posted by _tom (admin) Oct 07, 2010
- 1459 views
To understand what I wrote, here is the history:
- started by looking at the Perl documentation, and found a quick tutorial on regex
- tried to convert the perl tutorial and give it a Euphoria flavor
- recognizing that Euphoria 'native' functions are important, added this content
- the idea then was to show regex and Euphoria routines in parallel
Then, getting tired, I put the first draft up on the wiki.
The problem that remains:
- hard to squeeze a textbook into a few pages
- regex, on its own, is complex enough
- wildcard matching is just different enough from regex to make it confusing
A possible remedy:
- create a new section with only Euphoria functions
- match()
- wildcard()
- maybe string token
- then explain regex
- then show Euphoria vs regex
I will write a second draft and see how these ideas fit together.
Putting the first draft (rough as it was) on the wiki was a good thing. I appreciate the comments and ideas that you are presenting.
_tom
10. Re: searching and regex miniguide -- now new wiki page
- Posted by euphoric (admin) Oct 07, 2010
- 1441 views
Since there has already been tons written on PCRE/regex available on the web, shouldn't we just link to the most valuable resources? Euphoria find/match must of course be comprehensively documented, but the regex stuff can be sent off-site.
This is not a commentary on what's already been written, just a suggestion to make things easier for the maintainers of the information. :)
11. Re: searching and regex miniguide -- now new wiki page
- Posted by mattlewis (admin) Oct 07, 2010
- 1477 views
Since there has already been tons written on PCRE/regex available on the web, shouldn't we just link to the most valuable resources? Euphoria find/match must of course be comprehensively documented, but the regex stuff can be sent off-site.
This is not a commentary on what's already been written, just a suggestion to make things easier for the maintainers of the information. :)
To emphasize: Since euphoria uses PCRE, it makes a lot of sense for our documentation to be based on and/or to link to PCRE related information. The only thing more confusing than regular expressions themselves is figuring out the nuances between different implementations.
Matt
12. Re: searching and regex miniguide -- now new wiki page
- Posted by SDPringle Oct 07, 2010
- 1464 views
I found this error in what you wrote about regex.
A regex consumes memory. This memory is allocated by the C-code of the PCRE library which means that it may be necessary to manually free this memory using the Euphoria delete() function.
Since there is a cleanup facility now for calling delete and that cleanup facility is always applied to regex(). There is no need to call delete() on the regular expressions. Other than this, It is a document that contrasts well how the wildcard vs. regex patterns differ.
Shawn Pringle