1. Problems with GET.E
Is it me, or is the get() function in GET.E faulty? I seem
to recall discussion on the list regarding it a while back,
but I can't recall...
Basically, all I do is print() to a file, close it, reopen
it, then get() from the file. Correct me if I'm wrong, but
aren't several get() calls in a row supposed to return all
of the objects that were printed to the file? (And I'm
using the latest version of GET.E released with Eu 2.1)
If it's not, then might I humbly (but fervently) suggest
that it be made so? From what I can see, one way to do it
would be to have print() put a space after every object
it outputs.
If it IS supposed to work that way, then, um, someone will
have to tell me and I'll just post my code to the list for
examination...
Rod Jackson
2. Re: Problems with GET.E
Roderick Jackson writes:
> Correct me if I'm wrong, but aren't several get() calls
> in a row supposed to return all of the objects that were
> printed to the file? ...
get() requires that there be at least one character of
whitespace (blank, tab or new-line) separating the
top-level objects in the file.
You can easily observe that this will likely be
necessary between two atoms,
e.g.
111 222
but it shouldn't be necessary between
two sequences, or an atom and a sequence e.g.:
145.999{1,2,3}
or
{1,2,3}{4,5,6}
because the braces should be enough.
So why does get() require whitespace in *all* cases?
It was much easier to implement that way (it's a long story),
and I didn't think anyone would really mind.
> From what I can see, one way to do it
> would be to have print() put a space after every object
> it outputs.
What about the users of print() who don't want the
extra space? Is it that hard to add:
puts(fn, ' ')
to your code?
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
3. Re: Problems with GET.E
Robert Craig wrote (in response to Rod Jackson):
>>Correct me if I'm wrong, but aren't several get() calls
>>in a row supposed to return all of the objects that were
>>printed to the file? ...
>
>get() requires that there be at least one character of
>whitespace (blank, tab or new-line) separating the
>top-level objects in the file.
>
>[...]
>
>>From what I can see, one way to do it
>>would be to have print() put a space after every object
>>it outputs.
>
>What about the users of print() who don't want the
>extra space? Is it that hard to add:
> puts(fn, ' ')
>to your code?
I don't think it's so much a case of how easy or hard it is to add "puts(fn,
' ')" to a given section of code. For me, at least, it's a case of paired
functions having a not-quite symmetrical relationship. You can call get() as
many times in a row as you like, but for print() you have to throw in a
puts() for a space. The space between top-level objects may be necessary,
and certainly makes sense when you think about it, but it *is* somewhat
inconsistent with Euphoria's otherwise sensible and easy-to-understand
language design, IMO.
This is not limited to the get()/print() pair of functions, either -- Jason
Gade recently pointed out (11/12/1999; Subject: Euphoria features) that
position()/get_position() have a similarly non-symmetrical relationship.
Another (possibly imperfect) example which springs to mind would be the
video_config()/text_rows()/graphics_mode() family of routines.
In any event, for the immediate future, this is probably the best solution
for the problem in question:
global procedure put(integer fn, object data)
print(fn, data)
puts(fn, ' ') -- or '\n', either one will work
end procedure
This way, we have the symmetrically-paired routines get() and put(). This
enables those who wish to perform multiple get() calls against a file to
have a self-contained procedure to write their data out to that file,
without forcing the users of print() to use up an extra byte.
Hep yadda,
Gabriel Boehme
----------
There are few things as convincing as death to remind us of the quality with
which we live our lives.
Robert Fripp
----------
4. Re: Problems with GET.E
Robert Craig wrote:
>> Correct me if I'm wrong, but aren't several get() calls
>> in a row supposed to return all of the objects that were
>> printed to the file? ...
>
>get() requires that there be at least one character of
>whitespace (blank, tab or new-line) separating the
>top-level objects in the file.
>
>You can easily observe that this will likely be
>necessary between two atoms,
>e.g.
> 111 222
>
>but it shouldn't be necessary between
>two sequences, or an atom and a sequence e.g.:
>
> 145.999{1,2,3}
>or
> {1,2,3}{4,5,6}
>
>because the braces should be enough.
>So why does get() require whitespace in *all* cases?
>
>It was much easier to implement that way (it's a long story),
>and I didn't think anyone would really mind.
>
>> From what I can see, one way to do it
>> would be to have print() put a space after every object
>> it outputs.
>
>What about the users of print() who don't want the
>extra space? Is it that hard to add:
> puts(fn, ' ')
>to your code?
No... if you really *meant* it the way it is, I'm not
asking for a change; I can see some difficulty with
affixing a space or newline to the end of everything
print()ed. I just thought that your original intent
was to have people be able to use:
print (fn, obj1)
print (fn, obj2)
...
ext_obj1 = get (fn)
ext_obj2 = get (fn)
as naturally as they would puts() and getc(). As things
currently stand, it means I may have to leave out parallel
functions for print() and get() from my next project--
you'll see why soon. But, if that's the way they were
meant to operate anyway, ah well...
... meanwhile, Gabriel Boehme wrote:
>In any event, for the immediate future, this is probably the best solution
>for the problem in question:
>
>global procedure put(integer fn, object data)
> print(fn, data)
> puts(fn, ' ') -- or '\n', either one will work
>end procedure
I'll be certain to do something like this whenever I start
working with large datafiles. For now though, the situation
isn't causing any real problems--I just thought I'd clear it
up before it did.
(Side note: I know I'm going to be called on this, but
having get() and print() operate the way they currently do
DOES seem... ill-paired. There's really no need to change
get()--when writing two integers to a file, you're going
to HAVE to put something between them anyway. So maybe we
could try to find out how many folks use print() in a
manner that would cause problems if print() were changed
to attatch a newline? I suspect it's not that many,
if any at all...)
Rod Jackson
5. Re: Problems with GET.E
- Posted by Everett Williams <rett at GVTC.COM>
Nov 15, 1999
-
Last edited Nov 16, 1999
Roderick Jackson wrote:
>(Side note: I know I'm going to be called on this, but
>having get() and print() operate the way they currently do
>DOES seem... ill-paired. There's really no need to change
>get()--when writing two integers to a file, you're going
>to HAVE to put something between them anyway. So maybe we
>could try to find out how many folks use print() in a
>manner that would cause problems if print() were changed
>to attatch a newline? I suspect it's not that many,
>if any at all...)
>
>Rod Jackson
I like symmetry, too, but I'd like it without the newline attached to
every print(). The ability to write more than once to a line has
so many uses in report writing that it would be hard to
enumerate them. Especially with the lack of string facilities in
this language, it can simplify code quite a bit. A carriage return
without a line feed would be less problematic, but still
inconvenient.
Everett L.(Rett) Williams
rett at gvtc.com
6. Re: Problems with GET.E
- Posted by Roderick Jackson <rjackson at CSIWEB.COM>
Nov 15, 1999
-
Last edited Nov 16, 1999
Everett Williams wrote:
>Roderick Jackson wrote:
>
>>(Side note: I know I'm going to be called on this, but
>>having get() and print() operate the way they currently do
>>DOES seem... ill-paired. There's really no need to change
>>get()--when writing two integers to a file, you're going
>>to HAVE to put something between them anyway. So maybe we
>>could try to find out how many folks use print() in a
>>manner that would cause problems if print() were changed
>>to attatch a newline? I suspect it's not that many,
>>if any at all...)
>>
>>Rod Jackson
>
>I like symmetry, too, but I'd like it without the newline attached to
>every print(). The ability to write more than once to a line has
>so many uses in report writing that it would be hard to
>enumerate them.
I agree, but how many people need to use print() to
keep stuff on the same line? print(), unlike printf()
and puts(), is used for generic objects, not strings
or formatted output:
print (fn, "ABCD") --> "{65,66,67,68}" is the literal
--> string stored in your file,
--> not 5 characters.
If someone uses it extensively in their datafiles, I can
see them having a problem with the extra bytes eating up
memory; but I'd really like to know if anyone is doing
this (or is using the routine in a way that the format
would be messed up by the extra byte.) I suspect, because
of it's incompatibility with get(), that practically no
one ever uses print().
Of course, I could be wrong... I'm just curious.
Rod Jackson
7. Re: Problems with GET.E
- Posted by "Lucius L. Hilley III" <lhilley at CDC.NET>
Nov 15, 1999
-
Last edited Nov 16, 1999
I guess the easy fix is to override the built-in print command.
I.E.:
procedure old_print(integer fn, object x)
print(fn, x)
end procedure
without warning
procedure print(integer fn, object x)
print(fn, x)
puts(fn, 10)
-- you can replace 10 with '\n' or {10} or "\n"
end procedure
with warning
Lucius L. Hilley III
lhilley at cdc.net
+----------+--------------+--------------+
| Hollow | ICQ: 9638898 | AIM: LLHIII |
| Horse +--------------+--------------+
| Software | http://www.cdc.net/~lhilley |
+----------+-----------------------------+
8. Re: Problems with GET.E
Gals & Guys,
Since I felt at least partially responsible for the get.e fiasco, I
thought I should try to fix it. Attached is my first go at it, a
little kludgy, but it seems to work. I have done only a couple of
quick tests, so be careful. I altered only the routines for get() and
value() functions, I left the rest of the clutter untouched.
Test results, comments & other brutalities will be very much
appreciated.
If it rains tonight (no tennis!), I'll try to improve it. Then you will
find a copy on my Euphoria page. jiri
-- snip --------------------------------------------------------------
-- file : get.e
-- author : jiri babor
-- email : jbabor at paradise.net.nz
-- project : get.e replacement
-- tool : euphoria 2.1
-- date : 99-11-17
------------------------------------
-- Input and Conversion Routines: --
-- get() --
-- value() --
-- wait_key() --
------------------------------------
-- error status values returned from get() and value():
global constant
GET_SUCCESS = 0,
GET_EOF = -1,
GET_FAIL = 1
constant M_WAIT_KEY = 26
constant
TRUE = 1,
FALSE=0,
DIGITS = "0123456789",
HEX_DIGITS = DIGITS & "ABCDEF",
START_NUMERIC = DIGITS & "-+.#"
type natural(integer x)
return x >= 0
end type
type char(integer x)
return x >= -1 and x <= 255
end type
object input_string -- string to be read from
integer error_flag -- error flag
natural input_file -- file to be read from
natural string_next
char ch -- the current character
global function wait_key()
-- Get the next key pressed by the user.
-- Wait until a key is pressed.
return machine_func(M_WAIT_KEY, 0)
end function
procedure get_ch()
-- set ch to the next character in the input stream (either string or file)
if sequence(input_string) then
if string_next <= length(input_string) then
ch = input_string[string_next]
string_next += 1
else
ch = GET_EOF
end if
else
ch = getc(input_file)
end if
end procedure
procedure skip_blanks()
-- skip white space
-- ch is "live" at entry and exit
while find(ch, " \t\n") do
get_ch()
end while
end procedure
constant
ESCAPE_CHARS = "nt'\"\\r",
ESCAPED_CHARS = "\n\t'\"\\\r"
function escape_char(char c)
-- return escape character
natural i
i = find(c, ESCAPE_CHARS)
if i = 0 then
return GET_FAIL
else
return ESCAPED_CHARS[i]
end if
end function
function get_qchar()
-- get a single-quoted character
-- ch is "live" at exit
char c
get_ch()
c = ch
if ch = '\\' then
get_ch()
c = escape_char(ch)
if c = GET_FAIL then
error_flag=GET_FAIL
return 0
end if
elsif ch = '\'' then
error_flag=GET_FAIL
return 0
end if
get_ch()
if ch != '\'' then
error_flag=GET_FAIL
return 0
else
get_ch()
return c
end if
end function -- get_qchar
function get_string()
-- get a double-quoted character string
-- ch is "live" at exit
sequence text
text = ""
while TRUE do
get_ch()
if ch = GET_EOF or ch = '\n' then
error_flag=GET_FAIL
return 0
elsif ch = '"' then
get_ch()
return text
elsif ch = '\\' then
get_ch()
ch = escape_char(ch)
if ch = GET_FAIL then
error_flag=GET_FAIL
return 0
end if
end if
text &= ch
end while
end function
type plus_or_minus(integer x)
return x = -1 or x = +1
end type
function get_number()
-- read a number
-- ch is "live" at entry and exit
plus_or_minus sign, e_sign
natural ndigits
integer hex_digit
atom mantissa, dec, e_mag
sign = +1
mantissa = 0
ndigits = 0
-- process sign
if ch = '-' then
sign = -1
get_ch()
elsif ch = '+' then
get_ch()
end if
-- get mantissa
if ch = '#' then -- process hex integer and return
get_ch()
while TRUE do
hex_digit = find(ch, HEX_DIGITS)-1
if hex_digit >= 0 then
ndigits += 1
mantissa = mantissa * 16 + hex_digit
get_ch()
else
if ndigits > 0 then
return sign * mantissa
else
error_flag=GET_FAIL
return 0
end if
end if
end while
end if
-- decimal integer or floating point
while ch >= '0' and ch <= '9' do
ndigits += 1
mantissa = mantissa * 10 + (ch - '0')
get_ch()
end while
if ch = '.' then -- get fraction
get_ch()
dec = 10
while ch >= '0' and ch <= '9' do
ndigits += 1
mantissa += (ch - '0') / dec
dec *= 10
get_ch()
end while
end if
if ndigits = 0 then
error_flag=GET_FAIL
return 0
end if
mantissa = sign * mantissa
if ch = 'e' or ch = 'E' then -- get exponent sign
e_sign = +1
e_mag = 0
get_ch()
if ch = '-' then
e_sign = -1
get_ch()
elsif ch = '+' then
get_ch()
end if
-- get exponent magnitude
if ch >= '0' and ch <= '9' then
e_mag = ch - '0'
get_ch()
while ch >= '0' and ch <= '9' do
e_mag = e_mag * 10 + ch - '0'
get_ch()
end while
else -- no exponent
error_flag=GET_FAIL
return 0
end if
e_mag *= e_sign
if e_mag > 308 then
-- rare case: avoid power() overflow
mantissa *= power(10, 308)
if e_mag > 1000 then
e_mag = 1000
end if
for i = 1 to e_mag - 308 do
mantissa *= 10
end for
else
mantissa *= power(10, e_mag)
end if
end if
return mantissa
end function
function get_sequence()
sequence s
integer comma, first -- flags
get_ch()
skip_blanks()
s={}
comma=FALSE
first=TRUE
while ch!='}' do
if comma or first then
if find(ch, START_NUMERIC) then
s=append(s, get_number())
elsif ch = '{' then
s=append(s, get_sequence())
get_ch()
skip_blanks()
elsif ch = '\"' then
s=append(s, get_string())
elsif ch = '\'' then
s=append(s, get_qchar())
elsif ch = -1 then
error_flag=GET_EOF
return 0
else
error_flag=GET_FAIL
return 0
end if
comma=FALSE
first=FALSE
elsif ch=',' then
comma=TRUE
get_ch()
skip_blanks()
else
error_flag=GET_FAIL
return 0
end if
end while
if comma then
error_flag=GET_FAIL
return 0
end if
return s
end function -- get_sequence
function Get()
-- read a Euphoria data object as a string of characters
-- set error_flag and return value
skip_blanks()
if find(ch, START_NUMERIC) then
return get_number()
elsif ch = '{' then
return get_sequence()
elsif ch = '\"' then
return get_string()
elsif ch = '\'' then
return get_qchar()
elsif ch = -1 then
error_flag=GET_EOF
return 0
else
error_flag=GET_FAIL
return 0
end if
end function -- Get()
global function get(integer file)
-- Read the string representation of a Euphoria object
-- from a file. Convert to the value of the object.
-- Return {error_status, value}.
input_file = file
input_string = 0
error_flag=GET_SUCCESS
get_ch()
return {error_flag, Get()}
end function
global function value(sequence string)
-- Read the representation of a Euphoria object
-- from a sequence of characters. Convert to the value of the object.
-- Return {error_status, value).
input_string = string
string_next = 1
error_flag=GET_SUCCESS
get_ch()
return {error_flag, Get()}
end function
global function prompt_number(sequence prompt, sequence range)
-- Prompt the user to enter a number.
-- A range of allowed values may be specified.
object answer
while 1 do
puts(1, prompt)
answer = gets(0) -- make sure whole line is read
puts(1, '\n')
answer = value(answer)
if answer[1] != GET_SUCCESS or sequence(answer[2]) then
puts(1, "A number is expected - try again\n")
else
if length(range) = 2 then
if range[1] <= answer[2] and answer[2] <= range[2] then
return answer[2]
else
printf(1,
"A number from %g to %g is expected here - try again\n",
range)
end if
else
return answer[2]
end if
end if
end while
end function
global function prompt_string(sequence prompt)
-- Prompt the user to enter a string
object answer
puts(1, prompt)
answer = gets(0)
puts(1, '\n')
if sequence(answer) and length(answer) > 0 then
return answer[1..length(answer)-1] -- trim the \n
else
return ""
end if
end function
global function get_bytes(integer fn, integer n)
-- Return a sequence of n bytes (maximum) from an open file.
-- If n > 0 and fewer than n bytes are returned,
-- you've reached the end of file.
sequence s
integer c
if n = 0 then
return {}
end if
c = getc(fn)
if c = -1 then
return {}
end if
s = repeat(c, n)
for i = 2 to n do
s[i] = getc(fn)
end for
while s[n] = -1 do
n -= 1
end while
return s[1..n]
end function
9. Re: Problems with GET.E
Thanks Jiri for attempting to improve get().
Your earlier efforts made the current
released get() much faster.
I haven't looked at your code yet, because
I suspect that you haven't taken into account
the thing that really made me require whitespace
between top-level objects. It's not just a problem
of correctly parsing an input stream from a file.
That can be done, and maybe you've achieved it.
I think it was even done correctly in a much earlier release of
Euphoria.
Consider the problem of parsing:
123{99, 88, 77}
It will be necessary to read '1', '2' and '3' from the input file,
and then read '{' to know that you've reached the end of the
123 atom. get() must then return the atom's value.
The next call to get() must somehow see '{' again. That can
be achieved by retaining an "ungotten" character or in
some other way that requires get.e to retain some information
between calls in some variable. This can work when
you make consecutive calls to get() on the same input file.
But what are you going to do if the user makes a call to
get() *on a different file*, or if he performs a seek() to move
to a different part of the same file. Your ungotten or
saved character is now wrong.
You could try to fix this by instead seeking back one position
at the end of get(), so the next get() will see the same
character again. I don't want to get into this because
I don't want to assume that seek() will work on all types
of files/devices on all platforms. I could also attempt to
implement an "unget" facility for all Euphoria file I/O,
but as I said, I don't consider the whitespace
separation requirement to be so terrible that it requires
any major effort to improve it. In any case you will
always need whitespace between top-level atoms.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
10. Re: Problems with GET.E
Rob,
Somehow, wrongly, I assumed print() inserts a space after each top
level *number* (betraying my forth past, I suppose). This is not,
strictly speaking, necessary with the other elements, because the
*closing* curly bracket, as well as the quote and double quote
characters themselves can be treated as separators.
You also wrote:
>You could try to fix this by instead seeking back one position at the
>end of get(), so the next get() will see the same character again. I
>don't want to get into this because I don't want to assume that seek()
>will work on all types of files/devices on all platforms. I could also
>attempt to implement an "unget" facility for all Euphoria file I/O,
>but as I said, I don't consider the whitespace separation requirement
>to be so terrible that it requires any major effort to improve it. In
>any case you will always need whitespace between top-level atoms.
The real remedy should be quite clear by now. Change the print command
to output the extra space after each number, or better still, after
each top level object. It will improve readability, so close to so
many hearts in this forum, and it will not break any existing code or
a database. And we will be able to use the 'print/get' pair as
suggested in your documentation *without* any additional hassles.
Truly formatted output is adequately covered by different routines
anyway.
jiri
11. Re: Problems with GET.E
- Posted by Robert Craig <rds at ATTCANADA.NET>
Nov 17, 1999
-
Last edited Nov 18, 1999
Jiri Babor writes:
> The real remedy should be quite clear by now. Change
> the print command to output the extra space after each number,
> or better still, after each top level object.
Roderick Jackson already suggested that, and I am still
opposed to that idea. Adding your own extra space with puts()
is easy. Removing an unwanted extra space could
be difficult. Also, some existing code would probably break.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
12. Re: Problems with GET.E
Rob Craig wrote:
>Roderick Jackson already suggested that, and I am still
>opposed to that idea. Adding your own extra space with puts()
>is easy. Removing an unwanted extra space could
>be difficult. Also, some existing code would probably break.
No matter how hard I try I cannot imagine why on earth I would
want to remove the extra space. Without it the data is unusable
anyway.
jiri
13. Re: Problems with GET.E
Jiri Babor writes:
> No matter how hard I try I cannot imagine why on earth
> I would want to remove the extra space. Without it the
> data is unusable anyway.
I was mainly thinking of other situations where print() might be
used. print() can be used quite independently of get(), just
as get() can be (and usually is) used independently of print().
e.g.
puts(1, "there are ")
print(1, x)
puts(1, " days until the year 2000\n")
A knowledgeable user could also do this with printf(),
since in this case x is an atom, not a sequence.
I don't think it's a big deal if print() and get() do not
match perfectly. Anyway, there are cases where
using print() to create the input for get() will cause you
problems, such as when you have floating-point data,
and you need to preserve more than the 10 significant digits
that print() gives you. In those cases you will need to use
printf() (or for perfect results you should forget about
print/get and use atom_to_float64()).
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
14. Re: Problems with GET.E
- Posted by jiri babor <jbabor at PARADISE.NET.NZ>
Nov 18, 1999
-
Last edited Nov 19, 1999
Thanks, Rob, for your frankness, rarely used currency these days. It's
a pity I have not convinced you. print() is a comparatively
specialized routine and all your arguments for status quo are pretty
feeble. But I don't despair, I am sure you *will* change your mind.
jiri
15. Re: Problems with GET.E
Jiri Babor wrote:
>Rob Craig wrote:
>
>>Roderick Jackson already suggested that, and I am still
>>opposed to that idea. Adding your own extra space with puts()
>>is easy. Removing an unwanted extra space could
>>be difficult. Also, some existing code would probably break.
>
>No matter how hard I try I cannot imagine why on earth I would
>want to remove the extra space. Without it the data is unusable
>anyway.
Rob,
Thanks for making a statement on this one way or the
other. I'm not trying to "twist your arm" or anything
to change the language. But, I think you might want
to spend at least a little bit of time reconsidering:
I doubt there is more than a minute fraction of code
out there that uses print() in such a way that a new
format would break it. (I'm not counting the
instances where it's used inbetween puts(), since
that's generally an unformatted usage anyway.) If I
may suggest, a sampling of user contributions would
likely support this.
True, print() can be used seperately from get(). But I
would think that the instances where get() is used
without print() are smaller in number than the
instances where code would be broken by a new format.
get() practically demands a parallel routine that
writes data in a form get() can use. It's almost certain
that anyone using get() would NOT want the space removed.
If you still have concerns about changing print(),
perhaps a solution previously mentioned, a new routine
(e.g., put()) could solve most ills from this. It wouldn't
have to be a built-in; in fact, it would make sense to
locate it in the GET.E file, along with get(). It could
consist of nothing more than a call to print() and a call
to puts() for the space. I only suggest making it a
provided routine because (1) it then becomes a standard,
and (2) it avoids the 'remove' problem of having several
identical local routines (since practically anyone using
get() will use it.)
Rod Jackson