Re: comment removal.

new topic     » goto parent     » topic index » view thread      » older message » newer message

[snipped]

> > 
> > Origionaly I had to remove "//" comments from hundreds and thousands of
> > lines
> > of sourcecode, and that piece of code did the trick.
> > 
> > I thought about adding command line and prompt support and checking for
> > various
> > other cases but there was no need to do that at the time.
> > 
> > The code snippet did it's job and then got filed away, and probably never to
> > be used again by me.
> > 
> > Anyone is welcome to modify or complete the code for there needs. If you do,
> > I ask that you please share it with others.
> 
> Ok...
> After removing the quotes around INPUT and OUTPUT, since they cause a bad
>  file number (-1) to be reported, I fed a
>  file containing
> 
> }}}
<eucode>
> sequence s
> s="this line has -- inside it"
> </eucode>
{{{

>  to the code in the original post.
> 
> The output file had:
> }}}
<eucode>
> sequence s
> s="this line has 
> </eucode>
{{{

> , which I expected from reading the code.
> 
> file 1 compiles correctly, file 2 bombs out. This is not expected when
>  changing/adding/removing comments.
> 
> I remember having trouble writing an uncomment function for my enhanced
>  version of Visual Euphoria (Joe aka spent_memory). I'll dig up the code 
> and post it. It is definitively longer, but does the job. This is because 
> not only you have to take care of (multiple) string(s) inside any given line,
>  but locating where they start/end is made trickier if escaped double quotes
>  are there as well.
> 
> There are two options basically:
> 1/locate and mask the strings which are not inside a comment, then use
>  match(COMMENT,line) and truncate original line;
> 2/ work as scanner.e does, using a flag to detect whether it is inside a
>  string or not, plowing along the line and truncating it when COMMENT is
>  found outside a string. 
> I never investigated which approach is faster.
> 
> Next exercise: also auto uncomment /*...*/ embedded C comments smile
> 
> CChris

I added the parity backslash twist, so more tests may be needed:

function evenBackslashes(sequence s,integer q)
-- returns 1 if the number of contiguous backslash chars right before the double
quote is even
-- assert s[q]=34
    q-=1
    for i=q to 1 by -1 do
        if s[i]!='\\' then
            return not and_bits(xor_bits(q,i),1)
        end if
    end for
    return not and_bits(q,1)
end function

function findMatchingDQ(sequence s,integer q)
-- returns position in s of the double quote matching s[q-1]
    integer p

    while 1 do
        p=find('"',s[q..length(s)])
        if not p then return 0 end if  -- none found
        q+=p
        if evenBackslashes(s,q-1) then return q-1 end if
        s[q-1]=0  -- mask and try again
    end while
end function

global function removeComment(sequence s)
-- determines whether s has a comment mark, and strips the tail , starting at
the mark.
    integer p,q
    sequence sofar,s0

    sofar=""
    s0=s  puts(1,s&'\n')
    while 1 do
        p=match("--",s)
        if not p then return sofar & s end if
        while 1 do
            q=find('"',s)
            if not q or q>p then return sofar & s[1..p-1] end if
            if evenBackslashes(s,q) then  -- here starts a string
                q=findMatchingDQ(s,q+1)
                if not q then return s0 end if  -- unclosed string
                sofar &= s[1..q] -- accumulate valid part of s
                s=s[q+1..length(s)]
if q>p then -- the comment mark is inside a string, start over
                from end of string
                    exit
                else
                    p-=q     -- located mark may still be good
                end if
            else  -- this is an escaped double quote, keep looking past it
                sofar&=s[1..q]  -- accumulate valid part of s
                s=s[q+1..length(s)]
                p-=q         -- located mark may still be good
            end if
        end while
    end while
end function

-- stress test
-- assume a file contains the line as displayed:
-- puts(1,"this -- is \" inside a string"&"perhaps"&"\\\\\\"&'\n')
-- puts(1,removeComment("puts(1,\"this -- is \\\" inside a
string\"&\"perhaps\"&\"\\\\\\\\\\\\\" ) -- not really"))
-- ?machine_func(26,0)


Input and bug reports appreciated.

CChris

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu