1. Parsing

I should know how to do this, but I don't. Could some one show me an
example of parsing stuff read in from a file. For example sake an
HTML file, and reading any <P>, <BR>, or <B> tags. Thanks.


~~>Joseph Martin
~~>Personal: joe at cyber-wizard.com
~~>Web:  jam.net at poboxes.com
~~>URL: http://users.exis.net/~jam/

new topic     » topic index » view message » categorize

2. Parsing

Joe wrote:

> I should know how to do this, but I don't. Could some one show me an
> example of parsing stuff read in from a file. For example sake an
> HTML file, and reading any <P>, <BR>, or <B> tags. Thanks.

Here's a program that will read an HTML file, and return a sequence of token
in the form:

   { { tokenType, tokenValue } ... }

 -- CODE BEGINS HERE


 -- html.ex
 -- simple HTML parser

integer tag
sequence buffer, parse, text


 -- token types
constant    STRING  = 1,    -- string of text
            TAG     = 2     -- html tag


procedure if_err( object test, sequence errMessage )

    -- generic error handler
    -- if test is true, abort with message

    if test then
        puts( 1, errMessage & '\n' )
        abort( 0 )
    end if

end procedure


function read_file( sequence fName )

    -- see help file under 'gets()'
    -- read file fName, return as sequence

    atom handle
    sequence buffer
    object line


    -- open file
        handle = open( fName, "r" )
        if_err( handle = -1, "Unable to open file " & fName & "." )

    -- clear buffer
        buffer = {}

    -- read until end of file
    while 1 do
        line = gets(handle)
        if atom(line) then
            exit   -- end of file
        else
            buffer = append(buffer, line)
        end if
    end while
    close( handle )

    return buffer

end function


 -- read the file
    buffer = read_file( "test.htm" )

 -- parse the file
    parse = ""
    for line = 1 to length( buffer ) do
        -- clear tag
        tag = 0

        -- clear accumulated text
        text = ""

        for char = 1 to length( buffer[line] ) do

            -- start of html tag
            if buffer[line][char] = '<' then

                -- save accumulated text
                if length( text ) > 0 then
                    parse = append( parse, { STRING, text } )
                end if

                -- inside of tag already?
                if_err( tag != 0, "Error - unexpected '<' in tag." )

                -- start of tag
                tag = 1
                text = ""

            -- end of html tag
            elsif buffer[line][char] = '>' then

                -- was a tag started?
                if_err( tag = 0, "Error - unexpected '>'.\n" )

                -- write tag
                parse = append( parse, { TAG, text } )
                text = ""

                -- clear flag
                tag = 0

            -- end of line
            elsif buffer[line][char] = '\n' then

                -- was tag started?
                if_err( tag, "Error - unexpected end of line in tag.\n" )

                -- text accumulated?
                if length( text ) > 0 then
                    parse = append( parse, { STRING, text } )
                end if

                -- clear text
                text = ""

            -- normal character
            else

                -- accumulate text
                text = text & buffer[line][char]

            end if
        end for
    end for


 -- show results of parse

    -- each token
    for i = 1 to length( parse ) do

        -- show results, based on type of token
        if parse[i][1] = STRING then
            printf( 1, "STRING: %s\n", {parse[i][2]} )

        elsif parse[i][1] = TAG then
            printf( 1, "TAG   : %s\n", {parse[i][2]} )

        else
            if_err( 1, "Unknown token." )

        end if
    end for

 -- END OF CODE

Here's a test file:

 -- TEST FILE BEGINS HERE
<B>This is bold<\B>
<I>This is italic<\I>
<P>This is a new paragraph.
<BR>This is a paragraph break.
<B>This is bold <I>and italic<\I><\B>
 -- TEST FILE ENDS HERE

Hope this helps.

 -- David Cuny

new topic     » goto parent     » topic index » view message » categorize

3. Re: Parsing

I lost the original message and a couple other messages too.

Someone asked about parsing.
They even mentioned <HTML>.

Here is a throw together HTML parser.
This strips out MOST <tags>,
replaces linefeeds with spaces, and
replaces <BR>, <HR>, </H????>, </TITLE> with line feed.


----------Parses and displays file.htm---------
-----Few comments involved
include wildcard.e--used for changing some text to upper case.

sequence buffer
object line
integer handle
integer l, g
--l is lessthan
--g is greaterthan
integer lf
--lf is linefeed
lf = 10


handle = open("file.htm", "r")

buffer = {}
while 1 do
  line = gets(handle)
  if atom(line) then
    exit
  end if
  line[length(line)] = 32
  buffer = buffer & line
end while

l = find('<', buffer)
while l do
  g = find('>', buffer)
  buffer[l..g] = upper(buffer[l..g])
  if compare(buffer[l..g], "<BR>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..g], "<HR>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..g], "</TITLE>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..l + 2], "</H") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  else
    buffer = buffer[1..l - 1] & buffer[g + 1..length(buffer)]
  end if
  l = find('<', buffer)
end while
puts(1, buffer)
------------------End file------------

--Lucius Lamar Hilley III
--  E-mail at luciuslhilleyiii at juno.com
--  I support transferring of files less than 60K.
--  I can Decode both UU and Base64 format.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Parsing

At 05:33 PM 4/7/97 PST, you wrote:

>Joe wrote:
>
>> I should know how to do this, but I don't. Could some one show me an
>> example of parsing stuff read in from a file. For example sake an
>> HTML file, and reading any <P>, <BR>, or <B> tags. Thanks.
>
>Here's a program that will read an HTML file, and return a sequence of token
>in the form:

<code>

Unfortunatly, with both of the code supplied, you are forgeting the fact
that the "<" and ">" symbols can be used without starting a tag and that an
end-of-line CAN be inside them. You'd have to make a program that checks for
the key words as well as the symbols.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The Reaper  (J. Lays)   http://www.geocities.com/TimesSquare/Alley/4444/
reaper at auracom.com      Check out my Euphoria Games page at:
            -= http://www.geocities.com/TimesSquare/Alley/4444/eugames.html
      ........................
     . .. -||..........__......  "There is a silence before a storm,
      . /  ||......../-- \\.::::  A calm that is spent in fear;
   . ..|   ||...... /    | |.:::  But if that time was spent running,
     .|  _-||.......||   / /.:::: There may be nothing to be afraid of."
    ..| |..||...... -\_- \ |\-.:::
     .| |.[< \ .../            \.::
      .||.|||\|\ |  -      - .  \.::::
     ...|.\|| |  \  |        |   |.:::.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

new topic     » goto parent     » topic index » view message » categorize

5. Re: Parsing

On Tue, 8 Apr 1997 16:59:46 -0400 The Reaper <reaper at LOKI.ATCON.COM>
writes:

>At 05:33 PM 4/7/97 PST, you wrote:

>>Joe wrote:

>>> I should know how to do this, but I don't. Could some one show me
>an
>>> example of parsing stuff read in from a file. For example sake an
>>> HTML file, and reading any <P>, <BR>, or <B> tags. Thanks.

>>Here's a program that will read an HTML file, and return a sequence
>of token
>>in the form:

><code>

>Unfortunatly, with both of the code supplied, you are forgeting the
>fact
>that the "<" and ">" symbols can be used without starting a tag and
>that an
>end-of-line CAN be inside them. You'd have to make a program that
>checks for
>the key words as well as the symbols.
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>The Reaper  (J. Lays)

My CODE did take into account <end-of-line> inside.
and I don't plan to spend the time to create a HTML reader.
< and > will show if the character following < is not a letter
or an (!) exclamation point.

The proper way to show < is with &lt; and > with &gt; meaning
LessThan and GreaterThan. You are suppose to follow &lt with
the semicolon. To show & you are suppose to use &amp which
means Ampersand.

<dummy> and <I have created a dummy >
<! This is a HTML comment>

This is an Euphoria list it should not have turned into a HTML
lesson.

I and my Parsing accomplice where trying to show how Parsing
would be handled.  We had no intentions in creating a full blown
parser for HTML files.  (well at least I didn't :] )

But of course Reaper is right.  We should have mentioned that
we had not handled all the details of HTML.

My apologizes for not making that clear.

--Lucius Lamar Hilley III
--  E-mail at luciuslhilleyiii at juno.com
--  I support transferring of files less than 60K.
--  I can Decode both UU and Base64 format.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Parsing

re: parsing code does not handle all the exception.

Are you just complaining because I groused about your web page's color
scheme..? ;)

Lucius was correct. Like him, I was aware that the code did not handle the
exeptions that you mentioned, The intent was to provide an example of
parsing, not a complete HTML parser. I think the example was sufficiently
complicated without adding exception processing. I figured I could either
spend my time looking up HTML, or leave the rest as an exercise to the
reader.

However, to satisfy Mr. Reaper, I'll post a complete list of what my code
does *not* do. (I can hear you snicker already: Yeah, Cuny... It doesn't
*work*...)

 -- David Cuny

new topic     » goto parent     » topic index » view message » categorize

7. Re: Parsing

-- parser.ex
-- HTML parser

function parse(sequence line)
integer l, g
sequence output
    output = {}
    while 1 do
        l = find('<', line)
        g = find('>', line[l+1..length(line)]) + l
        if l = 0 or g = 0 then  -- no tags found
            output = output & line
            if compare(output, {10}) = 0 then
                output = {}
            end if
            return output
        end if
    output = output & line[1..l-1]
    line = line[g+1..length(line)]
    end while
end function    -- parse

procedure if_err(object test, sequence err_message)
-- generic error handler
-- if test is true, abort with message
    if test then
        puts(1, err_message & '\n')
        abort(0)
    end if
end procedure   -- if_err

function read_file(sequence filename)
-- see help file under 'gets()'
-- read file filename, return as sequence
integer handle
object line
sequence buffer
    -- open file
    handle = open(filename, "r")
    if_err(handle = -1, "Can't open file " & filename)
    -- clear buffer
    buffer = {}
    -- read until end of file
    while 1 do
        line = gets(handle)
        if atom(line) then
            exit    -- end of file
        else
            buffer = append(buffer, line)
        end if
    end while
    close(handle)
    return buffer
end function    -- read_file

integer handle
sequence buffer, output

constant filename = "output.txt"

-- read the file
buffer = read_file("file.htm")

-- open a file to write
handle = open(filename, "w")
if_err(handle = -1, "Can't open file " & filename)

output = repeat("", length(buffer))
for line = 1 to length(buffer) do
    output[line] = parse(buffer[line])
    puts(1, output[line])
    puts(handle, output[line])
end for
close(handle)

new topic     » goto parent     » topic index » view message » categorize

8. Re: Parsing

I tried to make a small program to parse a HTML file and strip the tags
from it.
It is attached to this message.
I tried it on the .htm from the Ofiicial Euphoria Page. The only thing that
is still giving a problem is a line like: "if c >= 'a' and c <= 'z' then
...".
This comes out as "if c >= 'a' and c = 'z'". I can see why, but cannot
think of a simple solution.
I strip out extra newline characters, and write the result to a file.

C what U can do with it!

=======================================
Ad Rienks       AdRienks at compuserve.com

new topic     » goto parent     » topic index » view message » categorize

9. Parsing

------=_NextPart_000_001F_01BF9213.1ECDC920
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi All

Could anyone point me in the direction of a www resource related to =
parsing techniques (nothing too heavy - I'm not clever!)

Thanks

Mark

------=_NextPart_000_001F_01BF9213.1ECDC920
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2314.1000" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT size=3D2>Hi All</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Could anyone point me in the direction of a www =
resource=20
related to parsing techniques (nothing too heavy - I'm not =
clever!)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Thanks</FONT></DIV>
<DIV>&nbsp;</DIV>

------=_NextPart_000_001F_01BF9213.1ECDC920--

new topic     » goto parent     » topic index » view message » categorize

10. Re: Parsing

------=_NextPart_000_010F_01BF951B.D52CC580
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Mark,

You might look at this, http://www.cs.vu.nl/~dick/PTAPG.html
it appears to be a downloadable version of a book, "Parsing Techniques-A =
Practical Guide".
I'm not sure if it's as simple as you want, and it is about 2meg Acrobat =
Reader (pdf) download.
I haven't looked at it, & probably wouldn't understand it if I did, but =
it might help you.

Dan Moyer

Mark wrote:

Hi All

Could anyone point me in the direction of a www resource related to =
parsing techniques (nothing too heavy - I'm not clever!)

Thanks

Mark


------=_NextPart_000_010F_01BF951B.D52CC580
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
<HTML>
<HEAD>

<META content=3Dtext/html;charset=3Diso-8859-1 =
http-equiv=3DContent-Type><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 =
Transitional//EN">
<META content=3D'"MSHTML 4.72.3110.7"' name=3DGENERATOR>
<STYLE></STYLE>

</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT color=3D#000000 face=3D"Times New Roman">Mark,</FONT></DIV>
<DIV><FONT color=3D#000000 face=3D"Times New Roman"></FONT>&nbsp;</DIV>
<DIV><FONT face=3D"Times New Roman">You might look at this, <A=20
TAPG.html</A></FONT></DIV>
<DIV><FONT face=3D"Times New Roman">it appears to be a downloadable =
version of a=20
book, &quot;Parsing Techniques-A Practical Guide&quot;.</FONT></DIV>
<DIV><FONT face=3D"Times New Roman">I'm not sure if it's as simple as =
you want,=20
and it is about 2meg Acrobat Reader (pdf) download.</FONT></DIV>
<DIV><FONT face=3D"Times New Roman"></FONT><FONT color=3D#000000=20
face=3D"Times New Roman">I haven't looked at it, &amp; probably wouldn't =

understand it if I did, but it might help you.</FONT></DIV>
<DIV><FONT color=3D#000000 face=3D"Times New Roman"></FONT>&nbsp;</DIV>
<DIV><FONT color=3D#000000 face=3D"Times New Roman">Dan =
Moyer</FONT></DIV>
<DIV><FONT color=3D#000000 face=3D"Times New Roman"></FONT>&nbsp;</DIV>
<DIV><FONT face=3D"Times New Roman">Mark wrote:</FONT></DIV>
<DIV><FONT face=3D"Times New Roman"></FONT>&nbsp;</DIV>
<DIV>
<DIV><FONT size=3D2>Hi All</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Could anyone point me in the direction of a www =
resource=20
related to parsing techniques (nothing too heavy - I'm not =
clever!)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT size=3D2>Thanks</FONT></DIV>
<DIV><FONT size=3D2></FONT>&nbsp;</DIV>
<DIV>
<DIV><FONT size=3D2>Mark</FONT></DIV></DIV></DIV>
<BLOCKQUOTE=20
style=3D"BORDER-LEFT: #000000 solid 2px; MARGIN-LEFT: 5px; PADDING-LEFT: =
5px">

------=_NextPart_000_010F_01BF951B.D52CC580--

new topic     » goto parent     » topic index » view message » categorize

11. Re: Parsing

--=====================_2125001==_.ALT

Another on-line free book and software for C++, Pascal (TP 5.5 and above) and
Modula-2 is "Compilers and Compiler Generators an introduction with C++"
located at:
http://scifac.ru.ac.za/compilers/

There is a pdf of the book (about 1 meg zip'd) and source files for  C++,
Pascal (TP 5.5 and above) and Modula-2


At 11:02 PM 3/23/00 -0800, you wrote:
>
> Mark,
>
> You might look at this,
> <http://www.cs.vu.nl/~dick/PTAPG.html>http://www.cs.vu.nl/~dick/PTAPG.html
> it appears to be a downloadable version of a book, "Parsing Techniques-A
> Practical Guide".
> I'm not sure if it's as simple as you want, and it is about 2meg Acrobat
> Reader (pdf) download.
> I haven't looked at it, & probably wouldn't understand it if I did, but it
> might help you.
>
> Dan Moyer
>
> Mark wrote:
>
> Hi All
>
> Could anyone point me in the direction of a www resource related to parsing
> techniques (nothing too heavy - I'm not clever!)
>
> Thanks
>
> Mark
>>
>>
>



Joel

"When the code works perfectly, the program is obsolete."
  -- "The Gosple According to St. Murphy"
--=====================_2125001==_.ALT

<html>
Another on-line free book and software for C++, Pascal (TP 5.5 and above)
and Modula-2 is &quot;Compilers and Compiler Generators an introduction
with C++&quot;&nbsp; located at: <br>
<a href="http://scifac.ru.ac.za/compilers/"
eudora="autourl">http://scifac.ru.ac.za/compilers/</a><br>
<br>
There is a pdf of the book (about 1 meg zip'd) and source files for&nbsp;
C++, Pascal (TP 5.5 and above) and Modula-2<br>
<br>
<br>
At 11:02 PM 3/23/00 -0800, you wrote: <br>
<font face="Times New Roman, Times"><blockquote type=cite cite>Mark,</font><br>
&nbsp;<br>
<font face="Times New Roman, Times">You might look at this,
<a
href="http://www.cs.vu.nl/~dick/PTAPG.html">http://www.cs.vu.nl/~dick/PTAPG.html</a></font><br>
it appears to be a downloadable version of a book, &quot;Parsing
Techniques-A Practical Guide&quot;.<br>
I'm not sure if it's as simple as you want, and it is about 2meg Acrobat
Reader (pdf) download.<br>
I haven't looked at it, &amp; probably wouldn't understand it if I did,
but it might help you.<br>
&nbsp;<br>
<font face="Times New Roman, Times">Dan Moyer</font><br>
&nbsp;<br>
<font face="Times New Roman, Times">Mark wrote:</font><br>
&nbsp;<br>
<font size=2>Hi All</font><br>
&nbsp;<br>
<font size=2>Could anyone point me in the direction of a www resource
related to parsing techniques (nothing too heavy - I'm not
clever!)</font><br>
&nbsp;<br>
<font size=2>Thanks</font><br>
&nbsp;<br>
<font size=2>Mark</font><br>
<blockquote type=cite cite>&nbsp;</blockquote></blockquote><br>
<br>

Joel<br>
<br>
&quot;<b><i>When the code works perfectly, the program is
obsolete</b></i>.&quot;<br>
<div align="right">
&nbsp; -- &quot;The Gosple According to St. Murphy&quot;</html>

--=====================_2125001==_.ALT--

new topic     » goto parent     » topic index » view message » categorize

12. Re: Parsing

At 11:02 PM 3/23/00 -0800, you wrote:
>Mark,
>
>You might look at this, http://www.cs.vu.nl/~dick/PTAPG.html
>it appears to be a downloadable version of a book, "Parsing Techniques-A
Practical Guide".
>I'm not sure if it's as simple as you want, and it is about 2meg Acrobat
Reader (pdf) download.
>I haven't looked at it, & probably wouldn't understand it if I did, but it
might help you.
>
>Dan Moyer
>
>Mark wrote:
>
>Hi All
>
>Could anyone point me in the direction of a www resource related to
parsing techniques (nothing too heavy - I'm not clever!)
>
>Thanks
>
>Mark
>
>
>Attachment Converted: "C:\EUDORA\ATTACH\ReParsin.htm"

   Have you seen the file PARR1.txt , posted to this server sometime ago ?

new topic     » goto parent     » topic index » view message » categorize

13. Parsing

Hello there,

I'm looking for a simple way to read from a text file, replacing certain
strings with others.

For example:
Hello there, my name is [Name].

[Name] would be replaced with "Greg".

There is an emphsis an accuracy, and simplicity.  There *must* be a
simpler way than how I'm doing it.

Thanks,
Greg Phillips

new topic     » goto parent     » topic index » view message » categorize

14. Re: Parsing

------=_NextPart_000_0107_01BF360D.065B8880
        charset="iso-8859-1"

Greg Phillips writes:
> I'm looking for a simple way to read from a text file,
> replacing certain strings with others.

I've attached a little utility that Junko wrote a long time ago.
It will go through a bunch of files, replacing certain strings
by other strings. It even makes a backup of the
original files to avoid disasters. It doesn't have any
user interface - you have to edit the strings and
filenames into it.

Regards,
     Rob Craig
     Rapid Deployment Software
     http://www.RapidEuphoria.com


------=_NextPart_000_0107_01BF360D.065B8880
        name="Replace.ex"
Content-Transfer-Encoding: quoted-printable

new topic     » goto parent     » topic index » view message » categorize

15. Re: Parsing

On linux?
sed 's/nametoreplace/newname/g' oldfile > newfile
...sorrysmile too tempting...
Riwal Raude
rauder at thmulti.com

> -----Original Message-----
> From: Greg Phillips [SMTP:i.shoot at REDNECKS.COM]
> Sent: Wednesday, November 24, 1999 5:15 AM
> To:   EUPHORIA at LISTSERV.MUOHIO.EDU
> Subject:      Parsing
>
> Hello there,
>
> I'm looking for a simple way to read from a text file, replacing certain
> strings with others.
>
> For example:
> Hello there, my name is [Name].
>
> [Name] would be replaced with "Greg".
>
> There is an emphsis an accuracy, and simplicity.  There *must* be a
> simpler way than how I'm doing it.
>
> Thanks,
> Greg Phillips

new topic     » goto parent     » topic index » view message » categorize

16. Re: Parsing

Greg Phillips wrote:

>I'm looking for a simple way to read from a text file, replacing
>certain strings with others.
>
>For example: Hello there, my name is [Name].

>[Name] would be replaced with "Greg".
>
>There is an emphsis an accuracy, and simplicity.  There *must* be a
>simpler way than how I'm doing it.


Greg,

The attached command line utility is not much different from Junko's
tool. Sort of a subset, you might say. It has just one advantage, I
can think of, it allows you to search for and replace even strings
containing new lines, etc, because it handles input/output files as
binaries.

It may be faster, because it handles the whole file in a single buffer
as against so many lines, but on the other hand, it may be slower,
because bigger slices may take longer to manipulate. I just do not
know, I have not conducted any speed tests.

Enjoy. jiri


-- <snip> ------------------------------------------------------------

--  file    : replace.ex
--  author  : jiri babor
--  email   : jbabor at paradise.net.nz
--  project : search & replace
--  tool    : euphoria 2.1
--  date    : 99-11-25
--  version : 1.00

----------------------------------------------------------------------
--  Usage: ex  replace  old_text  new_text  file1  file2  file3 ... --
----------------------------------------------------------------------
--  Replace all occurances of old_text string with new_text string  --
--  in all specified files.                                         --
--  Strings containing spaces must be enclosed in quotation marks!  --
----------------------------------------------------------------------
--  **************  Play safe! Back up your files!  **************  --
----------------------------------------------------------------------

include file.e

sequence buffer, cl, files, new_text, old_text

procedure help(sequence error_message)
    puts(1,"Error :  " & error_message & "\n")
    puts(1,"Syntax:  ex  replace  old_text  new_text  file1  file2 ...\n")
    puts(1,"Note  :  enclose text containing spaces in quotation marks.\n")
    abort(1)
end procedure

procedure read_file(sequence filename)
    integer f, len, n

    f=open(filename,"rb")
    if f=-1 then
        help("Couldn't open " & filename & " !\n")
    end if
    n=seek(f, -1)               -- go to end of input file
    len=where(f)                -- get length of input file in bytes
    n=seek(f, 0)                -- go back to beginning of input file
    buffer = repeat(0, len)     -- init buffer
    for i=1 to len do
        buffer[i] = getc(f)     -- read file into buffer
    end for
    close(f)
end procedure -- read_file

procedure replace()             -- basically same as Junko's
    integer i, j, lo, ln

    lo = length(old_text)
    ln = length(new_text)

    i = 0
    j = match(old_text, buffer)
    while j do
        j += i
        buffer = buffer[1..j-1] & new_text & buffer[j+lo..length(buffer)]
        i = j+ln-1
        j = match(old_text, buffer[i+1..length(buffer)])
    end while
end procedure

procedure write_file(sequence filename)
    integer f

    f=open(filename,"wb")       -- open output file
    puts(f, buffer)             -- write out
    close(f)                    -- close output file
end procedure -- write_file

-- main ------------------------------------------------------------------------

cl=command_line()
if length(cl) < 5 then
    help("Insufficient number of arguments!")
end if

old_text = cl[3]
new_text = cl[4]
files = cl[5..length(cl)]

for i=1 to length(files) do
    read_file(files[i])
    replace()
    write_file(files[i])
end for

puts(1, "Done!\n")

new topic     » goto parent     » topic index » view message » categorize

17. Re: Parsing

Thanks to both Robert and Jiri!

They're exactly what I needed.  I had one of those programming stumps, where I
knew
there's an easier way, but I just could for the life of me figure it out. The
way I
was doing it, I read in each character.  If the character = '[', I then checked
where the next ']' was, and then checked the data in between to see if it was
the
same as any of the words I was looking for.  It worked, but it was slow and
painful,
not to mention ugly.

Jiri, your's is slightly faster, when working with files the size I'm using
(quite
small, 3k maximum), and Junko's blows yours out of the water on larger files
(over
500k).  Relatively speaking of course.  The actual time it takes is almost
negligible.

I ended up using chunks of Junko's code, but once the program I'm developing
reaches
a certain point, I think I'll have to haul Jiri's solution out and take a look
at
it.

Thanks again!

Greg

new topic     » goto parent     » topic index » view message » categorize

18. Re: Parsing

Greg Phillips wrote:

>>>
Jiri, your's is slightly faster, when working with files the size I'm
using (quite small, 3k maximum), and Junko's blows yours out of the
water on larger files (over 500k). Relatively speaking of course.  The
actual time it takes is almost negligible
<<<

Hi, Greg (and anybody else also interested),

I just had a second look at my search-and-replace routine. I
eliminated all sequence concatenations from it (which can be veeery
slow for larger strings), and now it appears to be significantly
faster (about 30%) than Junko's, even for bigger files (over 500
kbytes).

The new version can be fetched from my Euphoria page, text section.

jiri

homepages.paradise.net.nz/~jbabor/euphoria.html

new topic     » goto parent     » topic index » view message » categorize

19. Parsing

When you get data from the command_line, Euphoria parses
it nicely, disposing of leading and trailing spaces, etc.
Why can we not call parse() ourselves, passing it a string,
and getting a sequence of words back?

Irv

new topic     » goto parent     » topic index » view message » categorize

20. Re: Parsing

Irv Mullins writes:
> When you get data from the command_line, Euphoria parses
> it nicely, disposing of leading and trailing spaces, etc.
> Why can we not call parse() ourselves, passing it a string,
> and getting a sequence of words back?

Here's a routine lifted from euphoria\bin\search.ex

function blank_delim(sequence s)
-- break up a blank-delimited, \n-terminated string,
-- to return a sequence of words
    sequence list, segment
    integer i

    list = {}
    i = 1
    while i < length(s) do
        while find(s[i], " \t") do
            i = i + 1
        end while
        if s[i] = '\n' then
            exit
        end if
        segment = ""
        while not find(s[i], " \t\n") do
            segment = segment & s[i]
            i = i + 1
        end while
        list = append(list, segment)
    end while
    return list
end function

Regards,
     Rob Craig
     Rapid Deployment Software
     http://members.aol.com/FilesEu/

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu