Euphoria Ticket #72: options passed in a sequence should be or'ed together, regex:matches

matches, all_matches, maybe others. the last parameter, options can be a sequence of options or a single atom. STRING_OFFSETS works in matches & all_matches but not if combined with other options. there are no test cases for this.

 
-- get -1 with or_all. not using the right bit mask? 
re = regex:new(".*", 
              {MULTILINE, NEWLINE_ANYCRLF}) --,DOTALL 
test_equal("regex ML w/or all_matches",  {{" x\n       0__", 1,13 }} 
                  , regex:all_matches(re, ` 

_________ x 
                0__`, 1, or_all({regex:STRING_OFFSETS, DOTALL }) )) 

 
--the next 2 give type check error 
re = regex:new(".*", 
              {MULTILINE, NEWLINE_ANYCRLF}) --,DOTALL 
test_equal("regex ML all_matches",  {{" x\n       0__", 1,13 }} 
                  , regex:all_matches(re, ` 

_________ x 
                0__`, 1, {regex:STRING_OFFSETS, DOTALL } )) 

 
re = regex:new(".*", 
              {MULTILINE, NEWLINE_ANYCRLF}) --,DOTALL 
test_equal("regex ML matches",  {{" x\n       0__", 1,13 }} 
                 , regex:matches(re, ` 

_________ x 
                0__`, 1, {regex:STRING_OFFSETS, DOTALL } )) 

 
 

include\std\regex.e:392 in function matches()  
type_check failure, str_offsets is {201326592,0}  
    re = {46'.',42'*'} 
 

Details

Type: Bug Report Severity: Major Category: Library Routine
Assigned To: unknown Status: Fixed Reported Release: 2951
Fixed in SVN #: 3121 View VCS: 3121 Milestone:

1. Comment by ne1uno Oct 02, 2009

I tried a few simple things to get this to work, changing the first few lines of all_matches

public function all_matches(regex re, sequence haystack, integer from=1, object options=DEFAULT) 
	if sequence(options) then options = or_all(options) end if 
	object match_data = find_all(re, haystack, from, and_bits(options, not_bits(STRING_OFFSETS))) 
	if length(match_data) = 0 then return ERROR_NOMATCH end if 
 
	integer str_offsets = and_bits(STRING_OFFSETS, options) 
... 

this fixes the type error and should have worked. maybe since the regex is already compiled it fails with new options?

2. Comment by jimcbrown Oct 17, 2009

As of 2950, we already did the or_all(options) in the appropriate places.

I commited a change to make str_offsets an object instead of an integer to avoid the type check issues.

However, your unittests still fail, stating that -1 was returned...

3. Comment by jimcbrown Oct 21, 2009

I notice that STR_OFFESTS is explicitly removed from the list of options in the various or_all()s that we do. Perhaps the regex is failing because of this?

Is there another multi-option regex test that does not include STR_OFFSETS that we can use to test this?

4. Comment by ne1uno Oct 21, 2009

str_offsets is used as a local boolean flag. changing to object may cause problems?

5. Comment by ne1uno Oct 22, 2009

removing and testing for STRING_OFFSETS looks ok, unless there were a problem with and_bits or or_bits. I had already tested that part just to make sure.

another test, I think does show adding options is not working.

--test a non STRING_OFFSETS function to see if can change options 
re = regex:new(`[AB]`) 
test_equal("split() #AB", { "", "sent and ", "sent" }, 
	regex:split(re, "Asent and Bsent")) 
 
test_equal("split() #ABiD", { "sent", "nd ", "sent" }, 
	regex:split(re, "Asent and Bsent",1, or_bits(CASELESS,DEFAULT) )) 
 
test_equal("split() #ABi", { "", "sent ", "nd ", "sent" }, 
	regex:split(re, "Asent and Bsent",1, CASELESS )) 
 

  pass: split() #AB 
 
  failed: split() #ABiD, expected: { 
  "sent", 
  "nd ", 
  "sent" 
} but got: { 
  "Asent and Bsent" 
} 
  failed: split() #ABi, expected: { 
  "", 
  "sent ", 
  "nd ", 
  "sent" 
} but got: { 
  "Asent and Bsent" 
} 
 
could be a bad test though, would not surprise me.

6. Comment by jimcbrown Oct 22, 2009

Looking at http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=pcre_exec gives a list of valid options to pcre_exec() (split() calls find_all() which calls find() which calls pcre_exec()), and that list of options is smaller than what is given in http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=pcre_compile (noteably, pcre_compile() supports CASELESS while pcre_exec() does not), which is called by new().

I think we need a test that uses valid options from pcre_exec()'s man page to tell if this works or not.

7. Comment by ne1uno Oct 23, 2009

good catch,

these pass, so valid options probably work ok.

 
re = regex:new(`(?i)[AB]`) 
test_equal("ignore case ?iAB", "x xnd x", 
	regex:find_replace(re, "A and B", "x")) 
 
re = regex:new(`(?i)[AB]`) 
test_equal("ignore case  ANCHORED ?iAB", "x and B", 
	regex:find_replace(re, "A and B", "x",1, ANCHORED)) 
 

I'll make more options tests.

8. Comment by mattlewis Oct 23, 2009

How can you tell if it's ignoring the case? It looks like your cases match. Shouldn't either the regex or the text have lower case somewhere?

9. Comment by ne1uno Oct 23, 2009

A and B

10. Comment by jimcbrown Oct 23, 2009

I see it. "xnd" -> its turning "A and B" into "x xnd x", so the lowercase "a" in "and" is also being replaced.

Is this bug fixed or do we still need more tests?

11. Comment by ne1uno Oct 24, 2009

in light of the new information, a good fix may be to better document what options are valid for new() and what for the routines. STRING_OPTIONS should be set apart. and all the routines should or_all any sequence passed. adding tests for this.

the choice of what options are valid for new() vrs routines() seems arbitrary. the docs need updating on what options are compiled into PCRE, heap, which utf if any etc.

btw, as a student of try it even if it makes no sense, I did try a few permutations of ignore case and other options. when you pass an invalid option in the function call, the whole regex is invalidated so you get back the same string. with new() you will get back an atom instead of a regex with bad options.

re = regex:new(`(?i)[AB]`) 
test_equal("ignore case  MULTILINE ?iAB", "x And B ang", 
	regex:find_replace(re, "a And B ang", "x",1, MULTILINE)) 
 
  failed: ignore case  MULTILINE ?iAB, expected: "x And B ang" but got: "a And B ang" 

12. Comment by SDPringle Mar 22, 2010

This is fixed please close this ticket.

Search



Quick Links

User menu

Not signed in.

Misc Menu