OpenEuphoria: Wiki Diff

Wiki Diff Values, revision #2 to tip

== Values

integer
atom
boolean
character
string

| | ##integer## | ##atom## | ##sequence## | ##object## | user defined |

| **integer** | | programming tip |
| range| ##-1_073_741_824## to ##+1_073_741_823## | |
| use for | speed\\reduced memory | |
| not for | 32-bit machine addresses | (use atom instead) |
| caution | additions and multiplications can exceed
the size limit of an integer | (assign these values to an atom) |
| ---- | ---- | ---- |
| **atom** | | programming tip |
| range | integers\\ -##power(2,53)## to +##power(2,53)## | ##power(2,53)## is slightly above 9.10^^15^^ |
| | floating-point\\-##power(2,1024)+1## to +##power(2,1024)-1## | ##power(2,1024)## is in the
10^^308^^ range |
| | large integers\\ | (up to about 15 digits)|
| | use for general calculations |
| | integers values, when overflow may occur |(assign result of large integer additions and multiplications to atom)|
| | 32-bit machine addresses |(**never** use integer type for this!) |
| | large valued integers | (rather than overflowing, large integers are converted to floating-point)|
| caution | floating-point values are limited by accuracy of computer hardware |(see: Floating-Point Arithmetic ) |

==== integer

An Euphoria ##integer## is a mathematical integer restricted to the range
##-1,073,741,824## to ##+1,073,741,823##.
As a result, a variable of the integer type, while allowing computations as fast
as possible, cannot hold 32-bit machine addresses, even though the latter are
mathematical integers. You must use the [[:atom]] type for this purpose. Also,
even though the product of two integers is a mathematical integer, it may not
fit into an ##integer##, and should be assigned to an ##atom## instead.

==== atom

An ##atom## can hold three kinds of data:
* Mathematical integers in the range ##-power(2,53)## to +##power(2,53)##
* Floating point numbers, in the range ##-power(2,1024)+1## to ##+power(2,1024)-1##
* Large mathematical integers in the same range, but with a "fuzz" that grows
with the magnitude of the integer.

##power(2,53)## is slightly above 9.10^^15^^, ##power(2,1024)## is in the
10^^308^^ range.

=== Floating-Point Math

Euphoria follows the standards set by IEEE for calculations for floating-point
arithmetic. There is an intrinsic limit to the accuracy of calculations
for any computer.

Because of these constraints, which arise in part from common hardware
limitations, some care is needed for specific purposes:
* The sum or product of two integers is an ##atom##, but may not be an
##integer##.
* Memory addresses, or handles acquired from anything non Euphoria, including
the operating system, **must** be stored as an ##atom##.
* For large numbers, usual operations may yield strange results.
* Calculations with numbers that differ by a tiny amount can yield strange
results
* Calculations requiring many computations may suffer from rounding and
truncation errors

This example, using algebra, has the exact result of one:

{{{
n*n - (n+1)*(n-1)
= n*n - ( n*n - n + n - 1 )
= n*n - n*n + 1
= 1
}}}

A computer can also be used perform this calculation. For "most" values
the correct answer is displayed. But ##n## has a size limit; if ##n## is too
big then capability of the computer hardware is breached:

<eucode>
integer n = power(2, 27) -- ok
integer n_plus = n + 1
integer n_minus = n - 1 -- ok
atom a = n * n -- ok
atom a1 = n_plus * n_minus -- still ok
? a - a1 -- prints 0, should be 1 mathematically
</eucode>

//This is not an Euphoria bug//. The IEEE 754 standard for floating point
numbers provides for 53 bits of precision for any real number, and an accurate
computation of ##a-a1## would require 54 of them. Intel FPU chips do have 64 bit
precision registers, but the low order 16 bits are only
used internally, and Intel recommends against using them for high precision
arithmetic. Their SIMD machine instruction set only uses the IEEE 754 defined
format.

==== Sequence

A ##sequence## is a type that is a //container//. A sequence has //elements// which
can be accessed through their //index//, like in ##my_sequence[3]##.
##sequence##s are so generic as being able to store all sorts of data
structures: strings, trees, lists, anything. Accesses to sequences are always
bound checked, so that you cannot read or write an element that does not exist,
ever. A large amount of extraction and shape change operations on
sequences is available, both as built-in operations and library routines. The
elements of a sequence can have any type.

Euphoria sequences are implemented very efficiently. Programmers used to
pointers will soon notice that they can get most usual pointer operations done
using sequence indexes. The loss is efficiency is usually hard to notice, and
the gain is code safety and bug prevention far outweighs it.

==== Object

This type can hold any data Euphoria can handle, both atoms and sequences.

The ##object##() type returns ##0## if a variable is not initialized, else ##1##.

=== Boolean Values ==

Euphoria has no special "boolean data-type."

In Euphoria, a **boolean value** is just the //interpretation// of a number
value that could be assigned to either an atom or integer.
A value is //**false**// if it is zero. A value is //**true**// if is is non-zero.

In some libraries, there may be constants pre-defined as:
<eucode>
FALSE = 0
TRUE = 1
</eucode>

Flow control statements depend on **atomic values** when a //true// or //false//
result is needed; **never a sequence**. If a sequence
is to be part of a conditional test, it //must// be used in a function that
returns an atomic result.

In this documentation //true// and //false// are used describe conditional
and flow structures. The proper atomic value must be actually used
for these tests.

A **boolean expression** is any expression that will evaluate to an atomic
value, and is interpreted as //true// or //false//.

See also: relational operators and control flow.

=== Character and String

Euphoria is easy to use for programming with text based data.

All of the usual Euphoria routines and operations work on text just like they
work on numbers. There are also libraries of routines designed to make
working with text easy: string centric routines and regular expressions.

|= Character |= String |
| <eucode>
'd' '3' '-' '&' 'f' 'G'
</eucode> | <eucode>
"Hello world"
"this is a string"
</eucode> |

|= Input |= Output |
| ##gets()## | ##puts()## |

=== Character

A **character** is one individual symbol such as a letter, digit, punctuation,
dingbat, ..., that we use for written communications.

An individual character may be written using single quote ##**'**##
delimiters:

<eucode>
'a' 'A' '[' '#'
</eucode>

They may be assigned to either an ##integer## or ##atom##:

<eucode>
atom char = 'a'
integer pound = '#'
</eucode>

There is no special
"character data-type" in Euphoria.
The standard ASCII chart assigns a number to each character. These number
values are used in Euphoria to represent characters.

Euphoria converts all character values to their numeric equivalence; only
number values are stored:

<eucode>
? 'a'
-- 97 appears, not 'a'
? 'A'
-- 65 appears, not 'A'
</eucode>

It is easy to display character values using ##puts()##:

<eucode>
puts(1, 'a' )
-- a <-- appears
puts(1, 'A' )
-- A <-- appears
</eucode>

There is no //automatic// way to distinguish the value ##97## intended to
be the number 'ninety-seven' and the value ##'a'## intended to be the
character ##a##. All values are numbers.

<eucode>
include std/console.e
display( 'a' )
display( 97 )
-- output for both examples is:
-- 97
-- 97
</eucode>

Therefore ##'B'## is just a notation that is equivalent to typing ##66##. There
are no "characters" in Euphoria, just numbers (atoms).

Values representing characters may be manipulated and operated on just like
any other numerical value~--they are numerical values.

Character atoms combine to make string sequences. Both examples represent the
same Euphoria sequence:

<eucode>
{ 'H','e','l','l','o',' ','W','o','r','l','d' }
"Hello World"
</eucode>

!! must describe unicode characters

==== Escaped Characters ====

Special characters may be entered using a back-slash:

|=Code | Meaning|
| \n | newline |
| \r | carriage return |
| \t | tab |
| {{{\\}}} | backslash |
| \" | double quote |
| \' | single quote |
| \0 | null |
| \e | escape |
| \E | escape |
| \b/d..d/ | A binary coded value, The 'b is followed by 1 or more binary digits. \\
Inside strings, use the space character to delimit end a binary value.
| \x/hh/ | A 2-hex-digit value: "\x5F" ==> {95} |
| \u/hhhh/ | A 4-hex-digit value: "\u2A7C" ==> {10876} |
| \U/hhhhhhhh/ | An 8-hex-digit value: "\U8123FEDC" ==> {2166619868} |

For example, ##"Hello, World!\n"##, or ##'~\~\'##. The Euphoria editor displays
character strings in green.

Sometimes the special characters are described as "non-printing characters"
because, while they control the layout of a display, nothing appears when
they are output.

Note that you can use the underscore character ##'_'## inside the
##\b##, ##\x##, ##\u##, and ##\U##
values to aid readability:

<eucode>
\U8123_FEDC -- as written using spacer _
{2166619868} -- value as stored
</eucode>

=== String

A **string** is a sequence of character values. There is no special
"string data-type" in Euphoria.

A string sequence may be written using double-quote ##**"**## delimiters:

<eucode>
"ABCDEFG"
</eucode>

A string is just like any other sequence in Euphoria. For each element of
a string, the character values are all converted to their numerical value.
Strings may be manipulated and operated on the same way as all other sequences
in Euphoria.

The string "ABCDEFG" is entirely equivalent to the sequence:

<eucode>
{65, 66, 67, 68, 69, 70, 71}
</eucode>

<eucode>
puts(1, "ABCDEFG" )
-- ABCDEFG <-- appears on output
print(1, "ABCDEFG" )
-- {65, 66, 67, 68, 69, 70, 71} <-- appears on output
</eucode>

A quoted string
is really just a convenient notation that saves you from having to type in all
the ASCII codes.
@[emptyseq|]
It follows that "" is equivalent to {}. Both represent
the sequence of zero length, also known as the **empty sequence**. As
a matter of programming style, it is natural to use "" to suggest a zero length
sequence of characters, and {} to suggest some other kind of sequence.

An **individual character** is an **atom**. It must be entered using single
quotes. There is a difference between an individual character (which is an
atom), and a character string of length one (which is a sequence):

<eucode>
'B' -- equivalent to the atom 66 -- the ASCII code for B
"B" -- equivalent to the sequence {66}
</eucode>

Keep in mind that an atom is //not// equivalent to a one-element sequence
containing the same value, although there are a few built-in routines that
choose to treat them similarly.

Some routines are able to //make an intelligent guess// if a sequence is
intended to be as string as opposed to a numerical sequence:

<eucode>
include std/console.e

? "Hello World"
-- {72,101,108,108,111,32,87,111,114,108,100} <-- appears

-- recognize that all sequences are numeric

display( "Hello World" )
-- Hello World <-- appears

-- string appears as expected

display( {72,101,108,108,111,32,87,111,114,108,100} )
-- Hello World <-- appears

-- appears as string since all element values are
-- character values

display( {72,101,108,108,111,32,87,111,114,108,100.1 } )
-- {72,101,108,108,111,32,87,111,114,108,100.1 } <-- appears

-- last element in the sequence is not a character value
-- sequence is output as a numerical sequence
</eucode>

;Hint
: Escaped characters may be written directly into a string, for characters
not available on the keyboard, and to control the
layout of the string:
<eucode>
puts(1, "This sentence\nis displayed\nover three lines" )
--
-- the '\n' escaped character creates line breaks
--
--This sentence
--is displayed
--over three lines
</eucode>

%% style=floatright
%(
In a real and practical program it is possible to input, create, manipulate,
and then finally output strings, all without ever having to ever consider their
numerical basis. Euphoria lets you think in terms of 'values' rather than
specialized 'data-types'. String values can be manipulated just like any other
sequence. This generic quality of Euphoria makes programming simple and easy.
)%
\\\\\\

==== Character Strings and Individual Characters

A string in Euphoria is just as sequence of characters. That means that
individual characters may be indexed and manipulated just like any other
sequence.

To make working with strings easy, there are a variety of ways to enter
string values into a sequence:

|= Delimeter | |= Notation |= Example |
| **Left** | **Right** | | |
| ##"## | ##"## | double-quotes | <eucode>
"ABCDEFG"
</eucode> |
| ##**`**## | ##**`**## | back-quotes | <eucode>
`ABCDEFG`
</eucode> |
| ##**"""**## | ##**"""**## | three double-quotes | <eucode>
"""ABCDEFG"""
</eucode> |
| ##**b"**## | ##**"**## | binary byte strings | <eucode>
b"1001 00110110 0110_0111 1_0101_1010"
-- ==> {#9,#36,#67,#15A}
</eucode> |
| ##**x"**## | ##**"**## | hexadecimal byte strings | <eucode>
x"65 66 67 AE"
-- ==> {#65,#66,#67,#AE}
</eucode> |

==== Double-Quote Strings ====

# They begin and end with a double-quote ##**"**## character
# They cannot contain a double-quote
# They must be only on a single line
# They cannot contain the TAB character
# If they contain the back-slash '\' character, that character must immediately
be followed by one of the special //escape// codes. The back-slash and escape
code will be replaced by the appropriate single character equivalent.
If you need to include double-quote, end-of-line, back-slash, or TAB characters
inside a double-quoted string, you need to enter them in a special manner.

Examples:

<eucode>
"Bill said\n\t\"This is a back-slash \\ character\".\n"
</eucode>
Which, when displayed should look like ...
{{{
Bill said
"This is a back-slash \ character".
}}}

==== Raw Strings ====

# Enclose with three double-quotes {{{"""..."""}}} or back-quote. {{{`...`}}}
# The resulting string will //never// have any carriage-return characters in it.
# If the resulting string begins with a new-line, the initial new-line is
removed and any trailing new-line is also removed.
# A special form is used to automatically remove leading whitespace from the
source code text. You might code this form to align the source text for ease of
reading. If the first line after the raw string start token begins
with one or more underscore characters, the number of consecutive underscores
signifies the maximum number of whitespace characters that will be removed from
each line of the raw string text. The underscores represent an assumed left
margin width. **Note**, these leading underscores do not form part of the raw
string text.

Examples:

<eucode>
-- No leading underscores and no leading whitespace
`
Bill said
"This is a back-slash \ character".
`
</eucode>
Which, when displayed should look like ...
{{{
Bill said
"This is a back-slash \ character".
}}}

<eucode>
-- No leading underscores and but leading whitespace
`
Bill said
"This is a back-slash \ character".
`
</eucode>
Which, when displayed should look like ...
{{{
Bill said
"This is a back-slash \ character".
}}}
<eucode>
-- Leading underscores and leading whitespace
`
_____Bill said
"This is a back-slash \ character".
`
</eucode>
Which, when displayed should look like ...
{{{
Bill said
"This is a back-slash \ character".
}}}

Extended string literals are useful when the string contains new-lines, tabs,
or back-slash characters because they do not have to be entered
in the special manner. The back-quote form can be used when the string literal
contains a set of three double-quote characters, and the triple quote form can
be used when the text literal contains back-quote characters. If a literal
contains both a back quote and a set of three double-quotes, you will need to
concatenate two literals.

<eucode>
object TQ, BQ, QQ
TQ = `This text contains """ for some reason.`
BQ = """This text contains a back quote ` for some reason."""
QQ = """This text contains a back quote ` """ & `and """ for some reason.`
</eucode>

==== Binary Strings ====

# They begin with the pair ##**b"**## and end
with a double-quote ##**"**## character
# They can only contain binary digits (0-1), and space, underscore,
tab, newline, carriage-return. Anything else is invalid.
# An underscore is simply ignored, as if it was never there. It is used to aid
readability.
# Each set of contiguous binary digits represent a single sequence element
# They can span multiple lines
# The non-digits are treated as punctuation and used to delimit individual
values.

Examples:

<eucode>
b"1 10 11_0100 01010110_01111000" == {0x01, 0x02, 0x34, 0x5678}
</eucode>

==== Hexadecimal Strings ====

# They begin with the pair ##**x"**## and end with a double-quote ##**"**## character
# They can only contain hexadecimal digits (0-9 A-F a-f), and space, underscore,
tab, newline, carriage-return. Anything else is invalid.
# An underscore is simply ignored, as if it was never there. It is used to aid
readability.
# Each pair of contiguous hex digits represent a single sequence element with a
value from 0 to 255
# They can span multiple lines
# The non-digits are treated as punctuation and used to delimit individual
values.

Examples:

<eucode>
x"1 2 34 5678_AbC" == {0x01, 0x02, 0x34, 0x56, 0x78, 0xAB, 0x0C}
</eucode>

When you put too many hex characters together they are split up appropriately
for you:

<eucode>
x"656667AE" -- 8-bit ==> {#65,#66,#67,#AE}
</eucode>

OpenEuphoria

Wiki Diff Values, revision #2 to tip

Search

Include:

Quick Links

User menu

Misc Menu