1. My first contributions, regex:find

I sent this to the developers mailing list on sourceforge, but I'm not sure I have that set up right. There seems to be some disjuncture between my sourceforge email and my real one. So I'll post it here to make sure someone sees it:

So I've made my first contributions to the code, r2974 and r2975, which should have been one update, my mistake. This is the first contribution I've made to a real project, so I just wanted to explain what I did so someone could double check my work and make sure I haven't done anything wrong.

re:find was still returning a 0 if you had too many named groups. We are sending a maximum to get_ovector_size of 90, which translates to problems at 30 named groups. This is because pcre_exec in pcre_exec.h return a 0 on success when offsetcount isn't big enough. To fix the return of a 0 and instead get the data for the offsets we did have room for I changed this code in be_pcre:exec_pcre()

 
  211 	if( rc <= 0 ) { free(ovector); return rc; } 
  212  
  213 	// put the substrings into sequences 
  214 	s = NewS1( rc ); 
  215  
  216 	for( i = 1, j=0; i <= rc; i++ ) { 
  217 		sub = NewS1( 2 ); 
  218 		sub->base[1] = ovector[j++] + 1; 
  219 		sub->base[2] = ovector[j] > 0 ? ovector[j] : 0; 
  220 		j++; 
  221 		s->base[i] = MAKE_SEQ( sub ); 
  222 	} 
 

to this code

 
  214         if( rc < 0 ) { free(ovector); return rc; } 
  215  
  216         // put the substrings into sequences 
  217         s = NewS1( ovector_elements ); 
  218  
  219         for( i = 1, j=0; i <= ovector_elements; i++ ) { 
  220                 sub = NewS1( 2 ); 
  221                 sub->base[1] = ovector[j++] + 1; 
  222                 sub->base[2] = ovector[j] > 0 ? ovector[j] : 0; 
  223                 j++; 
  224                 s->base[i] = MAKE_SEQ( sub ); 
  225         } 
 

Where ovector_elements is ovector_size/3. To insure that ovector_size gets the right number sent to it so there aren't any problems I changed regex.e:find so that the 5th element sent to it is the maximum number of named groups that the user wishes to see data for. I then send that value (times 3, as needed) to get_ovector_size. Here is the new code:

 
  276 public function find(regex re, sequence haystack, integer from=1, object options=DEFAULT, integer size=30) 
  277         if sequence(options) then options = or_all(options) end if 
  278         size = get_ovector_size(re, size*3) 
  279  
  280         return machine_func(M_PCRE_EXEC, { re, haystack, options, from, size }) 
  281 end function 
 

hopefully I've followed all the conventions and haven't messed anything up. But if I have done something wrong, please let me know so I don't make the same mistake in the future.

Damien

new topic     » topic index » view message » categorize

2. Re: My first contributions, regex:find

my test show that the performance penalty of calling get_ovector_size every time as appose to being able to send in a hard value is about 2.5%-3%. Which is something I'll have to think on.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu