1. regex.e find_all -- another problem
- Posted by jessedavis Mar 26, 2010
- 1024 views
It must be because it's Friday! After fixing the problem of find_all() returning a sequence instead of an atom for null results I foolishly went back to the app coding only to discover that even when find_all finds a pattern and returns the sequence containing the location it has an extra level in the returned sequence. It really makes recovering the found sequence a pain.
when find_all returns a sequence it should be:
{{1,3},{5,7},...} --> a sequence of pairs
what it really returns is:
{{{1,3},{5,7},....}} -----> notice the extra braces??
A brute force & ignorance solution is given below: the changes are the first 6 lines following the while sequence(re..... statement and the addition of integer i which is set to zero EACH time the routine is called!
public function find_all(regex re, sequence haystack, integer from=1, object options=DEFAULT) if sequence(options) then options = or_all(options) end if object result sequence results = {} integer i i = 0 while sequence(result) with entry do if i = 0 then results = result i += 1 else results = append(results, result) end if from = max(result) + 1 if from > length(haystack) then exit end if entry result = find(re, haystack, from, options) end while if length(results) then return results else return -1 end if end function
Regards, jd
2. Re: regex.e find_all -- another problem
- Posted by SDPringle Mar 27, 2010
- 1030 views
The documentation is bad. The intention of find_all is to find all occurrences of a unique regular expression matches. Now a single "match" is described as a sequence of pairs. The first pair is for the entire match and the following pairs are for the sub-patterns groups if any. The output of find_all should be a sequence of sequences of integer-pairs.
Shawn Pringle
3. Re: regex.e find_all -- another problem
- Posted by jessedavis Mar 27, 2010
- 999 views
I agree that the documentation has problems; however, I think the following (actual terminal output) is what should happen. Notice that the find() routine returns only the first occurance and the find_all() returns a sequence that contains two sequence pairs, one for each occurance. Since the routine get_ovector_size() does not exist one needs the find_all routine to return something that does not require a lot of analysis to recover the found occurances, no matter how many ( none - all). This solution makes it easy to recover the occurances. Note that I have not looked into the question of other routines that depend on find_all() in regex.e....
Anyway here is the code and terminal output to demonstrate->
#!/home/jd/euphoria/bin/eui include std/pretty.e include std/regex.e as re sequence str_1 object b str_1 = "<property name=\"title\" translatable=\"yes\">My Window</property>" re:regex r1 = re:new("=\"[a-zA-Z ]") -- there are two of these re:regex r2 = re:new("george Vth") -- never find this one! puts(1,"\nfind results.............\n") b = re:find(r1,str_1) pretty_print(1,b,{0}) puts(1,"\nfind_all results.........\n") b = re:find_all(r1,str_1) pretty_print(1,b,{0}) puts(1,"\nfind_all null result.....\n") b = re:find_all(r2,str_1) pretty_print(1,b,{0}) puts(1,"\nend ....................\n")
Here is the terminal output------->
~/euphoria/jd_src>./reg_test.ex find results............. { {15,17} } find_all results......... { {15,17}, {36,38} } find_all null result..... -1 end ....................
Here is the modified find_all() routine from regex.e--->
public function find_all(regex re, sequence haystack, integer from=1, object options=DEFAULT) if sequence(options) then options = or_all(options) end if object result, results integer i results = -1 i = 0 while sequence(result) with entry do if i = 0 then results = result i += 1 else results = append(results, result[1]) end if from = max(result) + 1 if from > length(haystack) then exit end if entry result = find(re, haystack, from, options) end while return results end function
regards, jd
4. Re: regex.e find_all -- another problem
- Posted by jimcbrown (admin) Mar 27, 2010
- 901 views
Since the routine get_ovector_size() does not exist
Huh?
5. Re: regex.e find_all -- another problem
- Posted by jessedavis Mar 27, 2010
- 913 views
from the manual for euphoria 4.0:
10.2.6.2 get_ovector_size ;get_ovector_size; include std/regex.e public function get_ovector_size(regex ex, integer maxsize = 0) Returns the number of capturing subpatterns (the ovector size) for a regex 10.2.6.2.1 Parameters: 1. ex : a regex 2. maxsize : optional maximum number of named groups to get data from 10.2.6.2.2 Returns: An integer ------------------------------------- A search of the include file regex.e does not find get_ovector_size. Huh? Regards, jd
6. Re: regex.e find_all -- another problem
- Posted by jimcbrown (admin) Mar 27, 2010
- 909 views
from the manual for euphoria 4.0:
10.2.6.2 get_ovector_size ;get_ovector_size; include std/regex.e public function get_ovector_size(regex ex, integer maxsize = 0) Returns the number of capturing subpatterns (the ovector size) for a regex 10.2.6.2.1 Parameters: 1. ex : a regex 2. maxsize : optional maximum number of named groups to get data from 10.2.6.2.2 Returns: An integer ------------------------------------- A search of the include file regex.e does not find get_ovector_size. Huh? Regards, jd
I see this on line 236 of std/regex.e
public function get_ovector_size(regex ex, integer maxsize=0) integer m = machine_func(M_PCRE_GET_OVECTOR_SIZE, {ex}) if (m > maxsize) then return maxsize end if return m+1 end function
And on line 36
enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE, M_PCRE_ERROR_MESSAGE=95, M_PCRE_GET_OVECTOR_SIZE=97
I am quite confused as to why you would be missing get_ovector_size() in std/regex.e ... Does it work if you manually add this code back in?
7. Re: regex.e find_all -- another problem
- Posted by jessedavis Mar 27, 2010
- 919 views
Here is line 36 from my regex.e--->
enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE
the function get_ovector_size() is no where to be found.
I added the code and ran my little test again with the following result:
/home/jd/euphoria/include/std/regex.e:654 in function get_ovector_size() machine_proc/func(97,...) not supported
... called from ./reg_test.ex:17
I don't have the time for figure it out now. My regex.e file came with the Linux package download. It's a few weeks old.
Thanks for your help.
Regards, jd
8. Re: regex.e find_all -- another problem
- Posted by jimcbrown (admin) Mar 27, 2010
- 890 views
Looks like you are using an older version of euphoria. Beta 2?
Beta 3 should have this.
(The command "eui -version" will report what version of Euphoria you are using.)
Here is line 36 from my regex.e--->
enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE
the function get_ovector_size() is no where to be found.
I added the code and ran my little test again with the following result:
/home/jd/euphoria/include/std/regex.e:654 in function get_ovector_size() machine_proc/func(97,...) not supported
... called from ./reg_test.ex:17
I don't have the time for figure it out now. My regex.e file came with the Linux package download. It's a few weeks old.
Thanks for your help.
Regards, jd
9. Re: regex.e find_all -- another problem
- Posted by jessedavis Mar 27, 2010
- 886 views
Oh, God Now I am confused. According to the website (as of now) the latest release is beta 2. I downloaded it again (today) and cracked open regex.e.. The function is NOT there. Is there more than one official web site? version prints out 40000
I'm going out for another fifth (or two)..
Regards, jd
10. Re: regex.e find_all -- another problem
- Posted by jimcbrown (admin) Mar 27, 2010
- 985 views
Oh, God Now I am confused. According to the website (as of now) the latest release is beta 2. I downloaded it again (today) and cracked open regex.e.. The function is NOT there. Is there more than one official web site? version prints out 40000
I'm going out for another fifth (or two)..
Regards, jd
You are right when you said the downloads page at http://oe.cowgar.com/downloads/index.wc should just have a link pointing to the downloads page at sourceforge.net or at http://sourceforge.net/projects/rapideuphoria/files/ instead of listing the individual links to the individual versions.
The downloads page should be fixed, but I do not have the ability to do that.
11. Re: regex.e find_all -- another problem
- Posted by jessedavis Mar 28, 2010
- 853 views
Thanks for all your help.
1. I still feel that consistency is important. If find() returns an atom (-1) on failure, so should find_all().
2, Regarding what find_all returns on multiple finds of the same pattern in a string, I think the documentation is correct and what it actually returns is wrong.
Both these concerns are probably petty; however, a little consistency goes a long way,
Regards, jd