1. My first contributions, regex:find
- Posted by DamienRoseBlack Oct 31, 2009
- 947 views
I sent this to the developers mailing list on sourceforge, but I'm not sure I have that set up right. There seems to be some disjuncture between my sourceforge email and my real one. So I'll post it here to make sure someone sees it:
So I've made my first contributions to the code, r2974 and r2975, which should have been one update, my mistake. This is the first contribution I've made to a real project, so I just wanted to explain what I did so someone could double check my work and make sure I haven't done anything wrong.
re:find was still returning a 0 if you had too many named groups. We are sending a maximum to get_ovector_size of 90, which translates to problems at 30 named groups. This is because pcre_exec in pcre_exec.h return a 0 on success when offsetcount isn't big enough. To fix the return of a 0 and instead get the data for the offsets we did have room for I changed this code in be_pcre:exec_pcre()
211 if( rc <= 0 ) { free(ovector); return rc; } 212 213 // put the substrings into sequences 214 s = NewS1( rc ); 215 216 for( i = 1, j=0; i <= rc; i++ ) { 217 sub = NewS1( 2 ); 218 sub->base[1] = ovector[j++] + 1; 219 sub->base[2] = ovector[j] > 0 ? ovector[j] : 0; 220 j++; 221 s->base[i] = MAKE_SEQ( sub ); 222 }
to this code
214 if( rc < 0 ) { free(ovector); return rc; } 215 216 // put the substrings into sequences 217 s = NewS1( ovector_elements ); 218 219 for( i = 1, j=0; i <= ovector_elements; i++ ) { 220 sub = NewS1( 2 ); 221 sub->base[1] = ovector[j++] + 1; 222 sub->base[2] = ovector[j] > 0 ? ovector[j] : 0; 223 j++; 224 s->base[i] = MAKE_SEQ( sub ); 225 }
Where ovector_elements is ovector_size/3. To insure that ovector_size gets the right number sent to it so there aren't any problems I changed regex.e:find so that the 5th element sent to it is the maximum number of named groups that the user wishes to see data for. I then send that value (times 3, as needed) to get_ovector_size. Here is the new code:
276 public function find(regex re, sequence haystack, integer from=1, object options=DEFAULT, integer size=30) 277 if sequence(options) then options = or_all(options) end if 278 size = get_ovector_size(re, size*3) 279 280 return machine_func(M_PCRE_EXEC, { re, haystack, options, from, size }) 281 end function
hopefully I've followed all the conventions and haven't messed anything up. But if I have done something wrong, please let me know so I don't make the same mistake in the future.
Damien
2. Re: My first contributions, regex:find
- Posted by DamienRoseBlack Oct 31, 2009
- 894 views
my test show that the performance penalty of calling get_ovector_size every time as appose to being able to send in a hard value is about 2.5%-3%. Which is something I'll have to think on.