Historical Values, Revision 1

Data Objects

Predefined Types

Euphoria is a unique programming language because of the way it handles data. The Euphoria way is both powerful and easy to use.

The Euphoria data viewpoint:

  • All data is an object in the form of atoms or sequences.
  • An atom is a single numeric value.
  • A sequence is a collection of objects, either atoms or sequences themselves.

An atom (which is also an object) is any one value:

integer character boolean float with 'spacer' exponential other bases

0
100
-4

'x'
'$'
'\n'

0
1

98.6
-1

23_100_000

-1e6

0b101
#A000

A sequence (which is also an object) is any collection of objects:

empty single with atoms as a string

{ }

{ 3.14 }

{2, 3, 5, 7, 11, 13, 17, 19}

"Euphoria"
mixed content pairs nested

{{"jon", "smith"}, 52389, 97.25}

{ {"blue", 5}, {"red",3} }

{
1,
2,
{3, 3, 3},
4,
{ 5, {6} }
}

Different data-types
An empty sequence { } is not the same as the atom 0. A sequence of one element { 3 } is not the same as the atom 3.
Predictable behavior
Sometimes it helps to view a sequence as a list of "elements." Then, for some operations, an atom behaves like a one element sequence:

? { 2, 4 } & 5           -- one atom  (one element ) concatenated   
? { 2, 4 } & {5}         -- one sequence (of one element ) concatenated 
-- result is { 2, 4, 5 } 
                         -- the same for both examples  

Euphoria is designed to "make sense"; there are no surprising behaviors.

Hint
When thinking about data just consider: atom or sequence, one or many, individual or collection. Remembering which Euphoria routine to use, or how to use a particular operation, all becomes simpler when you realize that there may be only two choices to consider.

Because of objects
Programming in Euphoria is simpler than programming with a conventional language that requires many data-types.

Data-Type

A data-type is an abstract description of values. "Abstract" means we can talk about the members of a data-type without having to describe each and every member. The "description" consists of an algorithm that returns true or false when any given value is tested for being a member of the data-type.

The way data-types are used in Euphoria is simple:

  1. Every variable (really just a name for a value) has to be declared to be a specific data-type.
  2. You are free to create and manipulate values at will in your programming. Euphoria is not going to inhibit your "creative" programming.
  3. You can not assign a value to a variable of the wrong data-type. Euphoria is safe.

For example:

? 3.14 * 10     -- first value is an atom, second value is an integer 
                -- you are free to mix types 
                -- calculations produce the "expected" value                   
--> 31.4        -- result is an atom value 
 
integer x       -- an integer variable is declared 
x = 3.14 * 10   -- can not assign an atom value to an integer variable 
--> error        

Atom

An atom can be any integer or double-precision floating point value. Its value can range from approximately -1e300 (minus one times 10 to the power 300) to +1e300 with 15 decimal digits of accuracy.

To test if a value is an atom use the atom() function:

? atom( 1 )      -- 1 or 'true' 
? atom( -4e-10 ) -- 1 or 'true' 
? atom( "cat" )  -- 0 or 'false' 

Why do we call them atoms? Why not just "numbers"? Well, an atom is just a number, but we wanted to have a distinctive term that emphasizes that they are indivisible (that's what "atom" means in Greek). In the world of physics you can 'split' an atom into smaller parts, but you no longer have an atom--only particles. You can 'split' a number into smaller parts, but you no longer have a number--only digits.


atom programming tip
range integers
-power(2,53) to +power(2,53)
power(2,53) is slightly above 9.1015
large integers
about 15 digits
(automatic conversion to floating-point
when size is exceeded)
floating-point
-power(2,1024)+1 to +power(2,1024)-1
power(2,1024) is in the 10308 range
use for general calculations
integers, when overflow may occur (assign result of large integer additions and multiplications to atom)
32-bit machine addresses (never use integer type for this!)
large valued integers (rather than overflowing, large integers are converted to floating-point)
caution floating-point values are limited by accuracy of computer hardware (see notes on floats)

By default, number literals use base 10. You can write integer literals in other bases, namely binary (base 2), octal (base 8), and hexadecimal (base 16). To do this, the number is prefixed by a 2-character code that specifies the base:

Prefix Code Base Name
0b 2 Binary
0t 8 Octal
0d 10 Decimal
0x 16 Hexadecimal

Examples of values written in various bases:

 
 0b101 --> decimal 5 
 0t101 --> decimal 65 
 0d101 --> decimal 101 
 0x101 --> decimal 257 

Additionally, hexadecimal integers can also be written by prefixing the number with the # pound sign.

Examples of hexadecimal integers:

#FE             -- 254 
#A000           -- 40960 
#FFFF00008      -- 68718428168 
-#10            -- -16 

Only digits and the letters A, B, C, D, E, F, in either uppercase or lowercase, are allowed in hexadecimal numbers. Hexadecimal numbers are always positive, unless written like -# with a minus sign in front of the pound sign. For instance #FFFFFFFF is a huge positive number (4294967295), not -1, as some machine-language programmers might expect.

You can embed the underscore _ character when writing a numeric literal. This simulates the commas (or periods) sometimes used to "group" digits into sets of three. The underscore character can be located anywhere within a number, and used with any base:

atom big = 32_873_787   -- Set 'big' to the value 32873787 
 
atom salary = 56_110.66 -- Set salary to the value 56110.66 
 
integer defflags = #0323_F3CD 
 
object phone = 61_3_5536_7733 
 
integer bits = 0b11_00010_1 

Integer

The integer is a subset of the atom data-type. It too represents a single numerical value.

Computer hardware works best with integer values; calculations with integers are the fastest and require the least amount of memory. Integer values can be exactly represented by the base 2 binary system. For this reason Euphoria has the built-in integer data-type. When in range, specify integer instead of atom for improved performance.

Integer Programming Tip
range -1_073_741_824 to +1_073_741_823
use for speed
reduced memory
not for 32-bit machine addresses (use atom instead)
caution additions and multiplications
can exceed the size limit of an integer
(assign these values to an atom)

Calculations in Euphoria always produce the "correct" answer. Regardless of the data-types in the calculation, the result is always computed to the maximum accuracy available to the computer hardware. The integer data-type requires that values assigned to an integer variable be integers. Euphoria does not restrict how you calculate with these values.

A division between two integer values may produce a fractional value. Rather than truncate the decimal portion (as in modular arithmetic) the 'correct' fractional value is computed. A fractional result when assigned to an integer produces an error, but this value may be assigned to an atom. Similarly, when two integers are added or multiplied together the 'correct' value is calculated. The result of a multiplication or addition may produce a value that is too large to be assigned to an integer, but can be assigned to an atom.

Euphoria allocates 31-bits for an integer; integers can not be used to hold a 32-bit machine address. An atom must be used for this purpose.

Sequence

A sequence is represented by a list of objects within braces { }, separated by commas.

A sequence can contain any mixture of atom and sequence values; a sequence does not have to contain all the same data type. Because the objects contained in a sequence can be an arbitrary mix of atoms or sequences, it is an extremely versatile data structure, capable of representing any sort of data.

The other important feature of a sequence is that it is dynamic and limitless. A sequence may be dynamically changed as elements are added, removed, altered, or just shifted about. A single sequence could be as large as your computer memory allows.

Atoms are the basic building blocks of all the data that a Euphoria program can manipulate. With this analogy, sequences might be thought of as "molecules", made from atoms and other molecules. A better analogy would be that sequences are like directories, and atoms are like files. Just as a directory on your computer can contain both files and other directories, a sequence can contain both atoms and other sequences (and those sequences can contain atoms and sequences and so on).

The "Hierarchical Objects" part of the Euphoria acronym comes from the hierarchical nature of nested sequences. This should not be confused with the class hierarchies of certain object-oriented languages.

Sequences are implemented very efficiently, and automatic garbage collection takes care of memory allocations. As a result sequences are a fast and simple way of working with data. A sequence can represent any arrangement of data. That means that one, easy to use, data-type is all you need to learn to represent all forms of data structures--big or small, simple or complex.

In a conventional language expect to see a variety of data-types, each one different, and each one following its own special rules. Conventional languages force you to learn a lot.

Sequences can be nested to any depth. You can have sequences, within sequences, within sequences, and so on to any depth (until you run out of memory).

Braces are used to construct sequences out of a list of expressions. These expressions can be constant or evaluated at run-time:

{ x+6, 9, y*w+2, sin(0.5) } 

Object

images/oas.svg

A variable that is declared to be of the object type may be either an atom or sequence, and may switch between the two dynamically during program execution.

This provides versatility for those situations when it is not possible to anticipate the data-type being assigned to a variable. For example, when the contents of a file is read into a variable it could contain lines of text, hence the sequence type. But, the file could be empty and the read operation would return an integer flag of the atom type. A variable declared to be the object type can store anything.

As you will soon discover, sequences make Euphoria very simple and very powerful. Understanding atoms and sequences is the key to understanding Euphoria.

Note
Here the term object follows the fundamental definition as used by designers of interpreters and compilers. It is not the same as "object" used in a specialized way in "OOP" languages.

Not Categorized, Please Help

Search



Quick Links

User menu

Not signed in.

Misc Menu