1. regex.e find_all -- another problem

It must be because it's Friday! After fixing the problem of find_all() returning a sequence instead of an atom for null results I foolishly went back to the app coding only to discover that even when find_all finds a pattern and returns the sequence containing the location it has an extra level in the returned sequence. It really makes recovering the found sequence a pain.

when find_all returns a sequence it should be:

{{1,3},{5,7},...} --> a sequence of pairs

what it really returns is:

{{{1,3},{5,7},....}} -----> notice the extra braces??

A brute force & ignorance solution is given below: the changes are the first 6 lines following the while sequence(re..... statement and the addition of integer i which is set to zero EACH time the routine is called!

public function find_all(regex re, sequence haystack, integer from=1, object options=DEFAULT) 
	if sequence(options) then options = or_all(options) end if 
 
	object result 
	sequence results = {} 
	integer i 
 
	i = 0 
	while sequence(result) with entry do 
		if i = 0 then 
		results = result 
		i += 1 
		else 
		results = append(results, result) 
		end if 
		from = max(result) + 1 
 
		if from > length(haystack) then 
			exit 
		end if 
	entry 
		result = find(re, haystack, from, options) 
 
	end while 
	if length(results) then return results else return -1 end if 
end function 

Regards, jd

new topic     » topic index » view message » categorize

2. Re: regex.e find_all -- another problem

The documentation is bad. The intention of find_all is to find all occurrences of a unique regular expression matches. Now a single "match" is described as a sequence of pairs. The first pair is for the entire match and the following pairs are for the sub-patterns groups if any. The output of find_all should be a sequence of sequences of integer-pairs.

Shawn Pringle

new topic     » goto parent     » topic index » view message » categorize

3. Re: regex.e find_all -- another problem

I agree that the documentation has problems; however, I think the following (actual terminal output) is what should happen. Notice that the find() routine returns only the first occurance and the find_all() returns a sequence that contains two sequence pairs, one for each occurance. Since the routine get_ovector_size() does not exist one needs the find_all routine to return something that does not require a lot of analysis to recover the found occurances, no matter how many ( none - all). This solution makes it easy to recover the occurances. Note that I have not looked into the question of other routines that depend on find_all() in regex.e....

Anyway here is the code and terminal output to demonstrate->

 
#!/home/jd/euphoria/bin/eui 
include std/pretty.e 
include std/regex.e as re 
 
sequence str_1 
object b 
 
str_1 = "<property name=\"title\" translatable=\"yes\">My Window</property>" 
 
re:regex r1 = re:new("=\"[a-zA-Z ]")	-- there are two of these 
re:regex r2 = re:new("george Vth")		-- never find this one! 
 
puts(1,"\nfind results.............\n") 
b = re:find(r1,str_1) 
pretty_print(1,b,{0}) 
 
puts(1,"\nfind_all results.........\n") 
b = re:find_all(r1,str_1) 
pretty_print(1,b,{0}) 
 
puts(1,"\nfind_all null result.....\n") 
b = re:find_all(r2,str_1) 
pretty_print(1,b,{0}) 
puts(1,"\nend  ....................\n") 

Here is the terminal output------->

~/euphoria/jd_src>./reg_test.ex 
 
find results............. 
{ 
  {15,17} 
} 
find_all results......... 
{ 
  {15,17}, 
  {36,38} 
} 
find_all null result..... 
-1 
end  .................... 
 

Here is the modified find_all() routine from regex.e--->

public function find_all(regex re, sequence haystack, integer from=1, 
                         object options=DEFAULT) 
 
	if sequence(options) then options = or_all(options) end if 
 
	object result, results 
	integer i 
	 
	results = -1 
	i = 0 
 
	while sequence(result) with entry do 
		if i = 0 then 
		results = result 
		i += 1 
		else 
		results = append(results, result[1]) 
		end if 
		from = max(result) + 1 
		if from > length(haystack) then 
			exit 
		end if 
	entry 
		result = find(re, haystack, from, options) 
	end while 
	return results 
 
end function 
 

regards, jd

new topic     » goto parent     » topic index » view message » categorize

4. Re: regex.e find_all -- another problem

jessedavis said...

Since the routine get_ovector_size() does not exist

Huh?

new topic     » goto parent     » topic index » view message » categorize

5. Re: regex.e find_all -- another problem

from the manual for euphoria 4.0:

10.2.6.2 get_ovector_size ;get_ovector_size; 
include std/regex.e 
public function get_ovector_size(regex ex, integer maxsize = 0) 
Returns the number of capturing subpatterns (the ovector size) for a regex 
10.2.6.2.1 Parameters: 
      1. ex : a regex 
      2. maxsize : optional maximum number of named groups to get data from 
10.2.6.2.2 Returns: 
An integer 
 
------------------------------------- 
 
A search of the include file regex.e does not find get_ovector_size. 
 
Huh? 
 
Regards, 
jd 
 

new topic     » goto parent     » topic index » view message » categorize

6. Re: regex.e find_all -- another problem

jessedavis said...

from the manual for euphoria 4.0:

10.2.6.2 get_ovector_size ;get_ovector_size; 
include std/regex.e 
public function get_ovector_size(regex ex, integer maxsize = 0) 
Returns the number of capturing subpatterns (the ovector size) for a regex 
10.2.6.2.1 Parameters: 
      1. ex : a regex 
      2. maxsize : optional maximum number of named groups to get data from 
10.2.6.2.2 Returns: 
An integer 
 
------------------------------------- 
 
A search of the include file regex.e does not find get_ovector_size. 
 
Huh? 
 
Regards, 
jd 
 

I see this on line 236 of std/regex.e

public function get_ovector_size(regex ex, integer maxsize=0) 
 
        integer m = machine_func(M_PCRE_GET_OVECTOR_SIZE, {ex}) 
        if (m > maxsize) then 
                return maxsize 
        end if 
        return m+1 
end function 

And on line 36

enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE, M_PCRE_ERROR_MESSAGE=95, M_PCRE_GET_OVECTOR_SIZE=97

I am quite confused as to why you would be missing get_ovector_size() in std/regex.e ... Does it work if you manually add this code back in?

new topic     » goto parent     » topic index » view message » categorize

7. Re: regex.e find_all -- another problem

Here is line 36 from my regex.e--->

enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE

the function get_ovector_size() is no where to be found.

I added the code and ran my little test again with the following result:

/home/jd/euphoria/include/std/regex.e:654 in function get_ovector_size() machine_proc/func(97,...) not supported

... called from ./reg_test.ex:17

I don't have the time for figure it out now. My regex.e file came with the Linux package download. It's a few weeks old.

Thanks for your help.

Regards, jd

new topic     » goto parent     » topic index » view message » categorize

8. Re: regex.e find_all -- another problem

Looks like you are using an older version of euphoria. Beta 2?

Beta 3 should have this.

(The command "eui -version" will report what version of Euphoria you are using.)

jessedavis said...

Here is line 36 from my regex.e--->

enum M_PCRE_COMPILE=68, M_PCRE_FREE, M_PCRE_EXEC, M_PCRE_REPLACE

the function get_ovector_size() is no where to be found.

I added the code and ran my little test again with the following result:

/home/jd/euphoria/include/std/regex.e:654 in function get_ovector_size() machine_proc/func(97,...) not supported

... called from ./reg_test.ex:17

I don't have the time for figure it out now. My regex.e file came with the Linux package download. It's a few weeks old.

Thanks for your help.

Regards, jd

new topic     » goto parent     » topic index » view message » categorize

9. Re: regex.e find_all -- another problem

Oh, God Now I am confused. According to the website (as of now) the latest release is beta 2. I downloaded it again (today) and cracked open regex.e.. The function is NOT there. Is there more than one official web site? version prints out 40000

I'm going out for another fifth (or two)..

Regards, jd

new topic     » goto parent     » topic index » view message » categorize

10. Re: regex.e find_all -- another problem

jessedavis said...

Oh, God Now I am confused. According to the website (as of now) the latest release is beta 2. I downloaded it again (today) and cracked open regex.e.. The function is NOT there. Is there more than one official web site? version prints out 40000

I'm going out for another fifth (or two)..

Regards, jd

You are right when you said the downloads page at http://oe.cowgar.com/downloads/index.wc should just have a link pointing to the downloads page at sourceforge.net or at http://sourceforge.net/projects/rapideuphoria/files/ instead of listing the individual links to the individual versions.

The downloads page should be fixed, but I do not have the ability to do that.

new topic     » goto parent     » topic index » view message » categorize

11. Re: regex.e find_all -- another problem

Thanks for all your help.

1. I still feel that consistency is important. If find() returns an atom (-1) on failure, so should find_all().

2, Regarding what find_all returns on multiple finds of the same pattern in a string, I think the documentation is correct and what it actually returns is wrong.

Both these concerns are probably petty; however, a little consistency goes a long way,

Regards, jd

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu