Re: comment removal.
- Posted by CChris <christian.cuvier at agriculture.gouv.fr> Mar 21, 2007
- 592 views
[snipped] > > > > Origionaly I had to remove "//" comments from hundreds and thousands of > > lines > > of sourcecode, and that piece of code did the trick. > > > > I thought about adding command line and prompt support and checking for > > various > > other cases but there was no need to do that at the time. > > > > The code snippet did it's job and then got filed away, and probably never to > > be used again by me. > > > > Anyone is welcome to modify or complete the code for there needs. If you do, > > I ask that you please share it with others. > > Ok... > After removing the quotes around INPUT and OUTPUT, since they cause a bad > file number (-1) to be reported, I fed a > file containing > > }}} <eucode> > sequence s > s="this line has -- inside it" > </eucode> {{{ > to the code in the original post. > > The output file had: > }}} <eucode> > sequence s > s="this line has > </eucode> {{{ > , which I expected from reading the code. > > file 1 compiles correctly, file 2 bombs out. This is not expected when > changing/adding/removing comments. > > I remember having trouble writing an uncomment function for my enhanced > version of Visual Euphoria (Joe aka spent_memory). I'll dig up the code > and post it. It is definitively longer, but does the job. This is because > not only you have to take care of (multiple) string(s) inside any given line, > but locating where they start/end is made trickier if escaped double quotes > are there as well. > > There are two options basically: > 1/locate and mask the strings which are not inside a comment, then use > match(COMMENT,line) and truncate original line; > 2/ work as scanner.e does, using a flag to detect whether it is inside a > string or not, plowing along the line and truncating it when COMMENT is > found outside a string. > I never investigated which approach is faster. > > Next exercise: also auto uncomment /*...*/ embedded C comments > > CChris I added the parity backslash twist, so more tests may be needed:
function evenBackslashes(sequence s,integer q) -- returns 1 if the number of contiguous backslash chars right before the double quote is even -- assert s[q]=34 q-=1 for i=q to 1 by -1 do if s[i]!='\\' then return not and_bits(xor_bits(q,i),1) end if end for return not and_bits(q,1) end function function findMatchingDQ(sequence s,integer q) -- returns position in s of the double quote matching s[q-1] integer p while 1 do p=find('"',s[q..length(s)]) if not p then return 0 end if -- none found q+=p if evenBackslashes(s,q-1) then return q-1 end if s[q-1]=0 -- mask and try again end while end function global function removeComment(sequence s) -- determines whether s has a comment mark, and strips the tail , starting at the mark. integer p,q sequence sofar,s0 sofar="" s0=s puts(1,s&'\n') while 1 do p=match("--",s) if not p then return sofar & s end if while 1 do q=find('"',s) if not q or q>p then return sofar & s[1..p-1] end if if evenBackslashes(s,q) then -- here starts a string q=findMatchingDQ(s,q+1) if not q then return s0 end if -- unclosed string sofar &= s[1..q] -- accumulate valid part of s s=s[q+1..length(s)] if q>p then -- the comment mark is inside a string, start over from end of string exit else p-=q -- located mark may still be good end if else -- this is an escaped double quote, keep looking past it sofar&=s[1..q] -- accumulate valid part of s s=s[q+1..length(s)] p-=q -- located mark may still be good end if end while end while end function -- stress test -- assume a file contains the line as displayed: -- puts(1,"this -- is \" inside a string"&"perhaps"&"\\\\\\"&'\n') -- puts(1,removeComment("puts(1,\"this -- is \\\" inside a string\"&\"perhaps\"&\"\\\\\\\\\\\\\" ) -- not really")) -- ?machine_func(26,0)
Input and bug reports appreciated. CChris