OpenEuphoria: Forum: Reading comma delimited .csv data files

1. Reading comma delimited .csv data files

Posted by JAYBEEDEE <daviesjb at liv?ac.u?> Jan 18, 2008
643 views

I'm having difficulty in getting Eu to read multiple column data tables created
via Excel
and saved as comma delimited text files.

I'm confused by the get(), gets(), getc() and value() commands.

Using gets() I have no problem in reading data in a single column, but if there
are
more than one, then gets() reads each row (line) as a single element.

For example, the csv file might be of the form:

Month,,
6,,
8,,
1988,,


2,6,8
5,12,25.7
. 
etc

Note 2 blank lines after 1988,, and blank cells in the grid as ,,

I would like to write the data into a sequence emulating 3 horizontal elements
and n rows like

data{{}{}{}}  so that I can extract values using a statement like
cell_value=data[row][col]

So far I'm defeated!  Any suggestions?

new topic » topic index » view message » categorize

2. Re: Reading comma delimited .csv data files

Posted by c.k.lester <euphoric at ckleste?.co?> Jan 18, 2008
634 views

JAYBEEDEE wrote:

> For example, the csv file might be of the form:
> 
> Month,,
> 6,,
> 8,,
> 1988,,
> 
> 
> 2,6,8
> 5,12,25.7
> . 
> etc

So should that data be like

{
   { "Month", "" }
  ,{ 6, 0 }
  ,{ 8, 0 }
  ,{ 1988, 0 }
}

Why is 1988 in the month column?!

Anyway, this is very easy to do. Basically:

fn = open("myExcelOutputFile.csv","r")
line = gets( fn )
while sequence(line) do
 cols = parse(line,",") --<-- search Euphoria archive for this functionality
 grid = append(grid,cols)
 line = gets( fn )
end while
close(fn)


Now you can get values with grid[row][col].

parse() takes a line of text and separates it using the separator you provide.

new topic » goto parent » topic index » view message » categorize

3. Re: Reading comma delimited .csv data files

Posted by CChris <christian.cuvier at agric?lt?re.gouv.fr> Jan 18, 2008
648 views

JAYBEEDEE wrote:
> 
> 
> I'm having difficulty in getting Eu to read multiple column data tables
> created
> via Excel
> and saved as comma delimited text files.
> 
> I'm confused by the get(), gets(), getc() and value() commands.
> 
> Using gets() I have no problem in reading data in a single column, but if
> there
> are
> more than one, then gets() reads each row (line) as a single element.
> 
> For example, the csv file might be of the form:
> 
> Month,,
> 6,,
> 8,,
> 1988,,
> 
> 
> 2,6,8
> 5,12,25.7
> . 
> etc
> 
> Note 2 blank lines after 1988,, and blank cells in the grid as ,,
> 
> I would like to write the data into a sequence emulating 3 horizontal elements
> and n rows like
> 
> data{{}{}{}}  so that I can extract values using a statement like
> cell_value=data[row][col]
> 
> So far I'm defeated!  Any suggestions?

Euphoria doesn't have standard routines to read formatted input, contrary to
most other languages. Use the strtok library by Kat (in the archive) to split a
text string using commas as delimiters.

gets(your_file) will return say "8,," (a full line as a text string), and
applying the right function in strtok to this will split it to {"8","",""} (a
sequence of substrings some of which may be empty).

Now, if any of these substrings is known to represent a number, you can call
value() to perform the conversion.

FYI, win32lib has routines (w32split(), w32TextToNumber() and others) to do the
job.

A simple splitting function could be coded like this:

function split(sequence s)
  integer pos,prev_pos
  sequence result

  pos=find(',',s)
  if not pos then 
    return {s} -- no splitting took place
  end if 
  result={}
  prev_pos=0
  while pos do
    result=append(result,s[prev_pos+1..pos-1]) -- another substring
                                     -- substrings may well be empty
-- a sustring extends from prev delim+1 to next delim-1
    -- find next
    prev_pos=pos
    pos=find_from(',',s,prev_pos+1)
  end while
-- get tail substring and return the whole array
  return append(result,s[prev_pos+1..$]) 
end function

CChris

new topic » goto parent » topic index » view message » categorize

4. Re: Reading comma delimited .csv data files

Posted by JAYBEEDEE <daviesjb at liv??c.uk> Jan 19, 2008
663 views

CChris wrote:
> 
> JAYBEEDEE wrote:
> > 
> > 
> > I'm having difficulty in getting Eu to read multiple column data tables
> > created
> > via Excel
> > and saved as comma delimited text files.
> > 
> > I'm confused by the get(), gets(), getc() and value() commands.
> > 
> > Using gets() I have no problem in reading data in a single column, but if
> > there
> > are
> > more than one, then gets() reads each row (line) as a single element.
> > 
> > For example, the csv file might be of the form:
> > 
> > Month,,
> > 6,,
> > 8,,
> > 1988,,
> > 
> > 
> > 2,6,8
> > 5,12,25.7
> > . 
> > etc
> > 
> > Note 2 blank lines after 1988,, and blank cells in the grid as ,,
> > 
> > I would like to write the data into a sequence emulating 3 horizontal
> > elements
> > and n rows like
> > 
> > data{{}{}{}}  so that I can extract values using a statement like
> > cell_value=data[row][col]
> > 
> > So far I'm defeated!  Any suggestions?
> 
> Euphoria doesn't have standard routines to read formatted input, contrary to
> most other languages. Use the strtok library by Kat (in the archive) to split
> a text string using commas as delimiters. 
> 
> gets(your_file) will return say "8,," (a full line as a text string), and
> applying
> the right function in strtok to this will split it to {"8","",""} (a sequence
> of substrings some of which may be empty). 
> 
> Now, if any of these substrings is known to represent a number, you can call
> value() to perform the conversion.
> 
> FYI, win32lib has routines (w32split(), w32TextToNumber() and others) to do
> the job.
> 
> A simple splitting function could be coded like this:
> }}}
<eucode>
> function split(sequence s)
>   integer pos,prev_pos
>   sequence result
> 
>   pos=find(',',s)
>   if not pos then 
>     return {s} -- no splitting took place
>   end if 
>   result={}
>   prev_pos=0
>   while pos do
>     result=append(result,s[prev_pos+1..pos-1]) -- another substring
>                                      -- substrings may well be empty
> -- a sustring extends from prev delim+1 to next delim-1
>     -- find next
>     prev_pos=pos
>     pos=find_from(',',s,prev_pos+1)
>   end while
> -- get tail substring and return the whole array
>   return append(result,s[prev_pos+1..$]) 
> end function
> </eucode>
{{{

> CChris
Thanks, Chris

I had tried using find and find_from to slice up the strings but always got a
zero result.
I note that you included the comma in single quotes ',' whereas I used double
quotes ",".  Was this where I went wrong? There doesn't seem to be anything in
the Euphoria Manual
about this distinction.

Your code looks is if it should do the job, but I haven't tried it yet.

Incidentally - is there an index or database of "include" files and the
procedures they contain?
Searching the Archives comes up with a lot of chat, and files with unhelpful
titles
such as "Routines I wish had been included with Euphoria", but no indication as
to what they contain.
Makes life hard for us newbies.

new topic » goto parent » topic index » view message » categorize

5. Re: Reading comma delimited .csv data files

Posted by ChrisBurch2 <crylex at freeuk.c?.?k> Jan 19, 2008
673 views

JAYBEEDEE wrote:
> 
> CChris wrote:
> > 
> > JAYBEEDEE wrote:
> > > 
> > > 
> > > I'm having difficulty in getting Eu to read multiple column data tables
> > > created
> > > via Excel
> > > and saved as comma delimited text files.
> > > 
> > > I'm confused by the get(), gets(), getc() and value() commands.
> > > 
> > > Using gets() I have no problem in reading data in a single column, but if
> > > there
> > > are
> > > more than one, then gets() reads each row (line) as a single element.
> > > 
> > > For example, the csv file might be of the form:
> > > 
> > > Month,,
> > > 6,,
> > > 8,,
> > > 1988,,
> > > 
> > > 
> > > 2,6,8
> > > 5,12,25.7
> > > . 
> > > etc
> > > 
> > > Note 2 blank lines after 1988,, and blank cells in the grid as ,,
> > > 
> > > I would like to write the data into a sequence emulating 3 horizontal
> > > elements
> > > and n rows like
> > > 
> > > data{{}{}{}}  so that I can extract values using a statement like
> > > cell_value=data[row][col]
> > > 
> > > So far I'm defeated!  Any suggestions?
> > 
> > Euphoria doesn't have standard routines to read formatted input, contrary to
> > most other languages. Use the strtok library by Kat (in the archive) to
> > split
> > a text string using commas as delimiters. 
> > 
> > gets(your_file) will return say "8,," (a full line as a text string), and
> > applying
> > the right function in strtok to this will split it to {"8","",""} (a
> > sequence
> > of substrings some of which may be empty). 
> > 
> > Now, if any of these substrings is known to represent a number, you can call
> > value() to perform the conversion.
> > 
> > FYI, win32lib has routines (w32split(), w32TextToNumber() and others) to do
> > the job.
> > 
> > A simple splitting function could be coded like this:
> > }}}
<eucode>
> > function split(sequence s)
> >   integer pos,prev_pos
> >   sequence result
> > 
> >   pos=find(',',s)
> >   if not pos then 
> >     return {s} -- no splitting took place
> >   end if 
> >   result={}
> >   prev_pos=0
> >   while pos do
> >     result=append(result,s[prev_pos+1..pos-1]) -- another substring
> >                                      -- substrings may well be empty
> > -- a sustring extends from prev delim+1 to next delim-1
> >     -- find next
> >     prev_pos=pos
> >     pos=find_from(',',s,prev_pos+1)
> >   end while
> > -- get tail substring and return the whole array
> >   return append(result,s[prev_pos+1..$]) 
> > end function
> > </eucode>
{{{

> > CChris
> Thanks, Chris
> 
> I had tried using find and find_from to slice up the strings but always got
> a zero result. 
> I note that you included the comma in single quotes ',' whereas I used double
> quotes ",".  Was this where I went wrong? There doesn't seem to be anything
> in the Euphoria Manual
> about this distinction.

Yes, I had a great deal of difficulty with that distinction too when I first
started with Eu. "," represents a string, or list of things, and ',' represents
one thing. so "," is a list of one comma only, whereas ',' is only a comma.
To take it a small step further ",," is valid, but ',,' is invalid


> 
> Your code looks is if it should do the job, but I haven't tried it yet.
> 
> Incidentally - is there an index or database of "include" files and the
> procedures
> they contain?

What a fantastic idea!

> Searching the Archives comes up with a lot of chat, and files with unhelpful
> titles
> such as "Routines I wish had been included with Euphoria", but no indication
> as to what they contain.
> Makes life hard for us newbies.

Perservere - the rewards are great.

Chris

new topic » goto parent » topic index » view message » categorize

6. Re: Reading comma delimited .csv data files

Posted by CChris <christian.cuvier at ag?iculture.g?uv.fr> Jan 19, 2008
622 views

JAYBEEDEE wrote:
> 
> CChris wrote:
> > 
> > JAYBEEDEE wrote:
> > > 
> > > 
> > > I'm having difficulty in getting Eu to read multiple column data tables
> > > created
> > > via Excel
> > > and saved as comma delimited text files.
> > > 
> > > I'm confused by the get(), gets(), getc() and value() commands.
> > > 
> > > Using gets() I have no problem in reading data in a single column, but if
> > > there
> > > are
> > > more than one, then gets() reads each row (line) as a single element.
> > > 
> > > For example, the csv file might be of the form:
> > > 
> > > Month,,
> > > 6,,
> > > 8,,
> > > 1988,,
> > > 
> > > 
> > > 2,6,8
> > > 5,12,25.7
> > > . 
> > > etc
> > > 
> > > Note 2 blank lines after 1988,, and blank cells in the grid as ,,
> > > 
> > > I would like to write the data into a sequence emulating 3 horizontal
> > > elements
> > > and n rows like
> > > 
> > > data{{}{}{}}  so that I can extract values using a statement like
> > > cell_value=data[row][col]
> > > 
> > > So far I'm defeated!  Any suggestions?
> > 
> > Euphoria doesn't have standard routines to read formatted input, contrary to
> > most other languages. Use the strtok library by Kat (in the archive) to
> > split
> > a text string using commas as delimiters. 
> > 
> > gets(your_file) will return say "8,," (a full line as a text string), and
> > applying
> > the right function in strtok to this will split it to {"8","",""} (a
> > sequence
> > of substrings some of which may be empty). 
> > 
> > Now, if any of these substrings is known to represent a number, you can call
> > value() to perform the conversion.
> > 
> > FYI, win32lib has routines (w32split(), w32TextToNumber() and others) to do
> > the job.
> > 
> > A simple splitting function could be coded like this:
> > }}}
<eucode>
> > function split(sequence s)
> >   integer pos,prev_pos
> >   sequence result
> > 
> >   pos=find(',',s)
> >   if not pos then 
> >     return {s} -- no splitting took place
> >   end if 
> >   result={}
> >   prev_pos=0
> >   while pos do
> >     result=append(result,s[prev_pos+1..pos-1]) -- another substring
> >                                      -- substrings may well be empty
> > -- a sustring extends from prev delim+1 to next delim-1
> >     -- find next
> >     prev_pos=pos
> >     pos=find_from(',',s,prev_pos+1)
> >   end while
> > -- get tail substring and return the whole array
> >   return append(result,s[prev_pos+1..$]) 
> > end function
> > </eucode>
{{{

> > CChris
> Thanks, Chris
> 
> I had tried using find and find_from to slice up the strings but always got
> a zero result. 
> I note that you included the comma in single quotes ',' whereas I used double
> quotes ",".  Was this where I went wrong? There doesn't seem to be anything
> in the Euphoria Manual
> about this distinction.
> 

It obviously didn't help.
',' is a quoted character, which is a plain number, 44, just written in such a
way that you don't need to look up an ASCII table or write Chr$(,) or whatever it
is in Basic.
"," is a sequence, and is the same as {44}.
Since the line you retrieve from gets() is made of byte sized integers, you
should represent , by a byte sized integer, and accordingly write it either ','
or 44.

You'll find this topic covered in section 2.1.2, "Character strings and
individual characters", in the reference manual.

> Your code looks is if it should do the job, but I haven't tried it yet.
> 
> Incidentally - is there an index or database of "include" files and the
> procedures
> they contain?
> Searching the Archives comes up with a lot of chat, and files with unhelpful
> titles
> such as "Routines I wish had been included with Euphoria", but no indication
> as to what they contain.
> Makes life hard for us newbies.

This issue has been raised from time to time, but nothing has been done so far.
With like 2,000 entries currently in the Archive, this has to be a pretty big
community job - with the support of RDS -, and requires a very orderly follow-up
as updates or new files come in every other day. But that project would have its
uses indeed.

CChris

new topic » goto parent » topic index » view message » categorize

7. Re: Reading comma delimited .csv data files

Posted by Robert Craig <rds at RapidEu?hori?.com> Jan 19, 2008
676 views

CChris wrote:
> 
> JAYBEEDEE wrote:
> > Incidentally - is there an index or database of "include" files and the
> > procedures
> > they contain?
> > Searching the Archives comes up with a lot of chat, and files with unhelpful
> > titles
> > such as "Routines I wish had been included with Euphoria", but no indication
> > as to what they contain.
> > Makes life hard for us newbies.
> 
> This issue has been raised from time to time, but nothing has been done so
> far.

Actually I did something about it a couple of years ago,
but the information got buried in the "Extra Stuff from RDS" section.
unzip.txt gets automatically updated at the end of each month.
I've now copied the link to the main page 
in the Search area on the right side.
See the new link:
    "Search 1,700 contributed programs
    (files contained in .zip/.tar)"

    http://www.rapideuphoria.com/unzip.txt

This is just file names though, not routine names.
The .zips/.tgz's are in alphabetical order.
You can also see the file dates and sizes, so you can
try to find the latest version of an include file.

> With like 2,000 entries currently in the Archive, this has to be a pretty big
> community job - with the support of RDS -, and requires a very orderly
> follow-up
> as updates or new files come in every other day. But that project would have
> its uses indeed.

In addition to the normal *file description* search,
    http://www.rapideuphoria.com/archive.htm
there is also Aku's source file *content* search for the
whole Archive, (also on the main Euphoria page).
It seems to be about one year out of date:

http://www.kejut.com/prog/eusearch.php?aksi=cari&carian=parse&Submit=EuSearch&filterJenisE=1

Maybe Aku (are you out there?) can tell us how to maintain this.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic » goto parent » topic index » view message » categorize

8. Re: Reading comma delimited .csv data files

Posted by Aku <akusaya at ?mx?net> Jan 20, 2008
630 views

Robert Craig wrote:
> In addition to the normal *file description* search,
>     <a
>     href="http://www.rapideuphoria.com/archive.htm">http://www.rapideuphoria.com/archive.htm</a>
> there is also Aku's source file *content* search for the
> whole Archive, (also on the main Euphoria page).
> It seems to be about one year out of date:
> 
>   <a
>   href="http://www.kejut.com/prog/eusearch.php?aksi=cari&carian=parse&Submit=EuSearch&filterJenisE=1">http://www.kejut.com/prog/eusearch.php?aksi=cari&carian=parse&Submit=EuSearch&filterJenisE=1</a>
> 
> Maybe Aku (are you out there?) can tell us how to maintain this.

Hi!

Wow, time flies so fast, I didn't realize it has been more than one year 
since last update. I thought it was just several months ago. 

So I have just updated it from the archive :)
I also added a new feature which is grouping of duplicate files.
Therefore, same files in different archive (contributions) will only be
shown once, but the file names in which the keywords appear will be shown
and can be opened. 

Actually what I did was:
1. mirror www.rapideuphoria.com using wget
2. extract all zip, tgz, tar, rar files
3. put all file contents to mysql database
4. put a fulltext index on the file contents
Is it possible for someone to maintain this?

new topic » goto parent » topic index » view message » categorize

9. Re: Reading comma delimited .csv data files

Posted by Robert Craig <rds at Rap?dEuphoria.co?> Jan 20, 2008
649 views

Aku wrote:
> Wow, time flies so fast, I didn't realize it has been more than one year 
> since last update. I thought it was just several months ago. 
> 
> So I have just updated it from the archive :)

Thanks.

> I also added a new feature which is grouping of duplicate files.
> Therefore, same files in different archive (contributions) will only be
> shown once, but the file names in which the keywords appear will be shown
> and can be opened. 
> 
> Actually what I did was:
> 1. mirror www.rapideuphoria.com using wget
> 2. extract all zip, tgz, tar, rar files
> 3. put all file contents to mysql database
> 4. put a fulltext index on the file contents

Great.

> Is it possible for someone to maintain this?

I hope so, but if no one steps forward in the next month or so,
maybe I can develop yet another search facility for the site,
perhaps using a Euphoria database instead of SQL,
and maybe adapting the EUforum message search or the 
contributed programs search.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic » goto parent » topic index » view message » categorize

10. Re: Reading comma delimited .csv data files

Posted by Kat <kat12 at co?sahs.n?t> Jan 21, 2008
671 views
Last edited Jan 22, 2008

JAYBEEDEE wrote:
> 
> 
> I'm having difficulty in getting Eu to read multiple column data tables
> created
> via Excel
> and saved as comma delimited text files.
> 
> I'm confused by the get(), gets(), getc() and value() commands.
> 
> Using gets() I have no problem in reading data in a single column, but if
> there
> are
> more than one, then gets() reads each row (line) as a single element.
> 
> For example, the csv file might be of the form:
> 
> Month,,
> 6,,
> 8,,
> 1988,,
> 
> 
> 2,6,8
> 5,12,25.7
> . 
> etc
> 
> Note 2 blank lines after 1988,, and blank cells in the grid as ,,
> 
> I would like to write the data into a sequence emulating 3 horizontal elements
> and n rows like
> 
> data{{}{}{}}  so that I can extract values using a statement like
> cell_value=data[row][col]
> 
> So far I'm defeated!  Any suggestions?

Strtok was made for this task. It can retrieve, insert, find, match, and sort 
such records.

Kat

OpenEuphoria

1. Reading comma delimited .csv data files

2. Re: Reading comma delimited .csv data files

3. Re: Reading comma delimited .csv data files

4. Re: Reading comma delimited .csv data files

5. Re: Reading comma delimited .csv data files

6. Re: Reading comma delimited .csv data files

7. Re: Reading comma delimited .csv data files

8. Re: Reading comma delimited .csv data files

9. Re: Reading comma delimited .csv data files

10. Re: Reading comma delimited .csv data files

Search

Include:

Quick Links

User menu

Misc Menu