8.21 Text Manipulation

8.21.1 Routines

8.21.1.1 sprintf

<built-in> function sprintf(sequence format, object values)

This is exactly the same as printf(), except that the output is returned as a sequence of characters, rather than being sent to a file or device.

Parameters:
  1. format : a sequence, the text to print. This text may contain format specifiers.
  2. values : usually, a sequence of values. It should have as many elements as format specifiers in format, as these values will be substituted to the specifiers.
Returns:

A sequence, of printable characters, representing format with the values in values spliced in.

Comments:

printf(fn, st, x) is equivalent to puts(fn, sprintf(st, x)).

Some typical uses of sprintf() are:

  1. Converting numbers to strings.
  2. Creating strings to pass to system().
  3. Creating formatted error messages that can be passed to a common error message handler.
Example 1:
s = sprintf("%08d", 12345)
-- s is "00012345"
See Also:

printf, sprint, format

8.21.1.2 sprint

include std/text.e
namespace text
public function sprint(object x)

Returns the representation of any Euphoria object as a string of characters.

Parameters:
  1. x : Any Euphoria object.
Returns:

A sequence, a string representation of x.

Comments:

This is exactly the same as print(fn, x), except that the output is returned as a sequence of characters, rather than being sent to a file or device. x can be any Euphoria object.

The atoms contained within x will be displayed to a maximum of 10 significant digits, just as with print().

Example 1:
s = sprint(12345)
-- s is "12345"
Example 2:
s = sprint({10,20,30}+5)
-- s is "{15,25,35}"
See Also:

sprintf, printf

8.21.1.3 trim_head

include std/text.e
namespace text
public function trim_head(sequence source, object what = " \t\r\n", integer ret_index = 0)

Trim all items in the supplied set from the leftmost (start or head) of a sequence.

Parameters:
  1. source : the sequence to trim.
  2. what : the set of item to trim from source (defaults to " \t\r\n").
  3. ret_index : If zero (the default) returns the trimmed sequence, otherwise it returns the index of the leftmost item not in what.
Returns:

A sequence, if ret_index is zero, which is the trimmed version of source
A integer, if ret_index is not zero, which is index of the leftmost element in source that is not in what.

Example 1:
object s
s = trim_head("\r\nSentence read from a file\r\n", "\r\n")
-- s is "Sentence read from a file\r\n"
s = trim_head("\r\nSentence read from a file\r\n", "\r\n", TRUE)
-- s is 3
See Also:

trim_tail, trim, pad_head

8.21.1.4 trim_tail

include std/text.e
namespace text
public function trim_tail(sequence source, object what = " \t\r\n", integer ret_index = 0)

Trim all items in the supplied set from the rightmost (end or tail) of a sequence.

Parameters:
  1. source : the sequence to trim.
  2. what : the set of item to trim from source (defaults to " \t\r\n").
  3. ret_index : If zero (the default) returns the trimmed sequence, otherwise it returns the index of the rightmost item not in what.
Returns:

A sequence, if ret_index is zero, which is the trimmed version of source
A integer, if ret_index is not zero, which is index of the rightmost element in source that is not in what.

Example 1:
object s
s = trim_tail("\r\nSentence read from a file\r\n", "\r\n")
-- s is "\r\nSentence read from a file"
s = trim_tail("\r\nSentence read from a file\r\n", "\r\n", TRUE)
-- s is 27
See Also:

trim_head, trim, pad_tail

8.21.1.5 trim

include std/text.e
namespace text
public function trim(sequence source, object what = " \t\r\n", integer ret_index = 0)

Trim all items in the supplied set from both the left end (head/start) and right end (tail/end) of a sequence.

Parameters:
  1. source : the sequence to trim.
  2. what : the set of item to trim from source (defaults to " \t\r\n").
  3. ret_index : If zero (the default) returns the trimmed sequence, otherwise it returns a 2-element sequence containing the index of the leftmost item and rightmost item not in what.
Returns:

A sequence, if ret_index is zero, which is the trimmed version of source
A 2-element sequence, if ret_index is not zero, in the form {left_index, right_index}.

Example 1:
object s
s = trim("\r\nSentence read from a file\r\n", "\r\n")
-- s is "Sentence read from a file"
s = trim("\r\nSentence read from a file\r\n", "\r\n", TRUE)
-- s is {3,27}
s = trim(" This is a sentence.\n")  -- Default is to trim off all " \t\r\n"
-- s is "This is a sentence."
See Also:

trim_head, trim_tail

8.21.1.6 set_encoding_properties

include std/text.e
namespace text
public procedure set_encoding_properties(sequence en = "", sequence lc = "", sequence uc = "")

Sets the table of lowercase and uppercase characters that is used by lower and upper

Parameters:
  1. en : The name of the encoding represented by these character sets
  2. lc : The set of lowercase characters
  3. uc : The set of upper case characters
Comments:
  • lc and uc must be the same length.
  • If no parameters are given, the default ASCII table is set.
Example 1:
set_encoding_properties( "Elvish", "aeiouy", "AEIOUY")
Example 1:
set_encoding_properties( "1251") -- Loads a predefined code page.
See Also:

lower, upper, get_encoding_properties

8.21.1.7 get_encoding_properties

include std/text.e
namespace text
public function get_encoding_properties()

Gets the table of lowercase and uppercase characters that is used by lower and upper

Parameters:

none

Returns:

A sequence, containing three items.
{Encoding_Name, LowerCase_Set, UpperCase_Set}

Example 1:
encode_sets = get_encoding_properties()
See Also:

lower, upper, set_encoding_properties

8.21.1.8 lower

include std/text.e
namespace text
public function lower(object x)

Convert an atom or sequence to lower case.

Parameters:
  1. x : Any Euphoria object.
Returns:

A sequence, the lowercase version of x

Comments:
  • For Windows systems, this uses the current code page for conversion
  • For non-Windows, this only works on ASCII characters. It alters characters in the 'a'..'z' range. If you need to do case conversion with other encodings use the set_encoding_properties first.
  • x may be a sequence of any shape, all atoms of which will be acted upon.

WARNING, When using ASCII encoding, this can also affect floating point numbers in the range 65 to 90.

Example 1:
s = lower("Euphoria")
-- s is "euphoria"

a = lower('B')
-- a is 'b'

s = lower({"Euphoria", "Programming"})
-- s is {"euphoria", "programming"}
See Also:

upper, proper, set_encoding_properties, get_encoding_properties

8.21.1.9 upper

include std/text.e
namespace text
public function upper(object x)

Convert an atom or sequence to upper case.

Parameters:
  1. x : Any Euphoria object.
Returns:

A sequence, the uppercase version of x

Comments:
  • For Windows systems, this uses the current code page for conversion
  • For non-Windows, this only works on ASCII characters. It alters characters in the 'a'..'z' range. If you need to do case conversion with other encodings use the set_encoding_properties first.
  • x may be a sequence of any shape, all atoms of which will be acted upon.

WARNING, When using ASCII encoding, this can also affects floating point numbers in the range 97 to 122.

Example 1:
s = upper("Euphoria")
-- s is "EUPHORIA"

a = upper('b')
-- a is 'B'

s = upper({"Euphoria", "Programming"})
-- s is {"EUPHORIA", "PROGRAMMING"}
See Also:

lower, proper, set_encoding_properties, get_encoding_properties

8.21.1.10 proper

include std/text.e
namespace text
public function proper(sequence x)

Convert a text sequence to capitalized words.

Parameters:
  1. x : A text sequence.
Returns:

A sequence, the Capitalized Version of x

Comments:

A text sequence is one in which all elements are either characters or text sequences. This means that if a non-character is found in the input, it is not converted. However this rule only applies to elements on the same level, meaning that sub-sequences could be converted if they are actually text sequences.

Example 1:
s = proper("euphoria programming language")
-- s is "Euphoria Programming Language"
s = proper("EUPHORIA PROGRAMMING LANGUAGE")
-- s is "Euphoria Programming Language"
s = proper({"EUPHORIA PROGRAMMING", "language", "rapid dEPLOYMENT", "sOfTwArE"})
-- s is {"Euphoria Programming", "Language", "Rapid Deployment", "Software"}
s = proper({'a', 'b', 'c'})
-- s is {'A', 'b', c'} -- "Abc"
s = proper({'a', 'b', 'c', 3.1472})
-- s is {'a', 'b', c', 3.1472} -- Unchanged because it contains a non-character.
s = proper({"abc", 3.1472})
-- s is {"Abc", 3.1472} -- The embedded text sequence is converted.
See Also:

lower upper

8.21.1.11 keyvalues

include std/text.e
namespace text
public function keyvalues(sequence source, object pair_delim = ";,", object kv_delim = ":=",
        object quotes = "\"'`", object whitespace = " \t\n\r", integer haskeys = 1)

Converts a string containing Key/Value pairs into a set of sequences, one per K/V pair.

Parameters:
  1. source : a text sequence, containing the representation of the key/values.
  2. pair_delim : an object containing a list of elements that delimit one key/value pair from the next. The defaults are semi-colon (;) and comma (,).
  3. kv_delim : an object containing a list of elements that delimit the key from its value. The defaults are colon (:) and equal (=).
  4. quotes : an object containing a list of elements that can be used to enclose either keys or values that contain delimiters or whitespace. The defaults are double-quote ("), single-quote (') and back-quote (`)
  5. whitespace : an object containing a list of elements that are regarded as whitespace characters. The defaults are space, tab, new-line, and carriage-return.
  6. haskeys : an integer containing true or false. The default is true. When true, the kv_delim values are used to separate keys from values, but when false it is assumed that each 'pair' is actually just a value.
Returns:

A sequence, of pairs. Each pair is in the form {key, value}.

Comments:

String representations of atoms are not converted, either in the key or value part, but returned as any regular string instead.

If haskeys is true, but a substring only holds what appears to be a value, the key is synthesized as p[n], where n is the number of the pair. See example #2.

By default, pairs can be delimited by either a comma or semi-colon ",;" and a key is delimited from its value by either an equal or a colon "=:". Whitespace between pairs, and between delimiters is ignored.

If you need to have one of the delimiters in the value data, enclose it in quotation marks. You can use any of single, double and back quotes, which also means you can quote quotation marks themselves. See example #3.

It is possible that the value data itself is a nested set of pairs. To do this enclose the value in parentheses. Nested sets can nested to any level. See example #4.

If a sub-list has only data values and not keys, enclose it in either braces or square brackets. See example #5. If you need to have a bracket as the first character in a data value, prefix it with a tilde. Actually a leading tilde will always just be stripped off regardless of what it prefixes. See example #6.

Example 1:
s= keyvalues("foo=bar, qwe=1234, asdf='contains space, comma, and equal(=)'")
-- s is 
-- {
--   {"foo", "bar"}, 
--   {"qwe", "1234"}, 
--   {"asdf", "contains space, comma, and equal(=)"}
--  }
Example 2:
s = keyvalues("abc fgh=ijk def")
-- s is { {"p[1]", "abc"}, {"fgh", "ijk"}, {"p[3]", "def"} }
Example 3:
s = keyvalues("abc=`'quoted'`")
-- s is { {"abc", "'quoted'"} }
Example 4:
s = keyvalues("colors=(a=black, b=blue, c=red)")
-- s is { {"colors", {{"a", "black"}, {"b", "blue"},{"c", "red"}}  } }
s = keyvalues("colors=(black=[0,0,0], blue=[0,0,FF], red=[FF,0,0])")
-- s is 
-- { {"colors", 
--   {{"black",{"0", "0", "0"}}, 
--   {"blue",{"0", "0", "FF"}},
--   {"red", {"FF","0","0"}}}} }
Example 5:
s = keyvalues("colors=[black, blue, red]")
-- s is { {"colors", { "black", "blue", "red"}  } }
Example 6:
s = keyvalues("colors=~[black, blue, red]")
-- s is { {"colors", "[black, blue, red]"}  } }
-- The following is another way to do the same.
s = keyvalues("colors=`[black, blue, red]`")
-- s is { {"colors", "[black, blue, red]"}  } }

8.21.1.12 escape

include std/text.e
namespace text
public function escape(sequence s, sequence what = "\"")

Escape special characters in a string

Parameters:
  1. s: string to escape
  2. what: sequence of characters to escape defaults to escaping a double quote.
Returns:

An escaped sequence representing s.

Example 1:
sequence s = escape("John \"Mc\" Doe")
puts(1, s)
-- output is: John \"Mc\" Doe
See Also:

quote

8.21.1.13 quote

include std/text.e
namespace text
public function quote(sequence text_in, object quote_pair = {"\"", "\""}, integer esc = - 1,
        t_text sp = "")

Return a quoted version of the first argument.

Parameters:
  1. text_in : The string or set of strings to quote.
  2. quote_pair : A sequence of two strings. The first string is the opening quote to use, and the second string is the closing quote to use. The default is {"\"", "\""} which means that the output will be enclosed by double-quotation marks.
  3. esc : A single escape character. If this is not negative (the default), then this is used to 'escape' any embedded quote characters and 'esc' characters already in the text_in string.
  4. sp : A list of zero or more special characters. The text_in is only quoted if it contains any of the special characters. The default is "" which means that the text_in is always quoted.
Returns:

A sequence, the quoted version of text_in.

Example 1:
-- Using the defaults. Output enclosed in double-quotes, no escapes and no specials.
s = quote("The small man")
-- 's' now contains '"the small man"' including the double-quote characters.
Example 2:
s = quote("The small man", {"(", ")"} )
-- 's' now contains '(the small man)'
Example 3:
s = quote("The (small) man", {"(", ")"}, '~' )
-- 's' now contains '(The ~(small~) man)'
Example 4:
s = quote("The (small) man", {"(", ")"}, '~', "#" )
-- 's' now contains "the (small) man"
-- because the input did not contain a '#' character.
Example 5:
s = quote("The #1 (small) man", {"(", ")"}, '~', "#" )
-- 's' now contains '(the #1 ~(small~) man)'
-- because the input did contain a '#' character.
Example 6:
-- input is a set of strings...
s = quote({"a b c", "def", "g hi"},)
-- 's' now contains three quoted strings: '"a b c"', '"def"', and '"g hi"'
See Also:

escape

8.21.1.14 dequote

include std/text.e
namespace text
public function dequote(sequence text_in, object quote_pairs = {{"\"", "\""}},
        integer esc = - 1)

Removes 'quotation' text from the argument.

Parameters:
  1. text_in : The string or set of strings to de-quote.
  2. quote_pairs : A set of one or more sub-sequences of two strings, or an atom representing a single character to be used as both the open and close quotes. The first string in each sub-sequence is the opening quote to look for, and the second string is the closing quote. The default is "\"", "\"" which means that the output is 'quoted' if it is enclosed by double-quotation marks.
  3. esc : A single escape character. If this is not negative (the default), then this is used to 'escape' any embedded occurrences of the quote characters. In which case the 'escape' character is also removed.
Returns:

A sequence, the original text but with 'quote' strings stripped of quotes.

Example 1:
-- Using the defaults.
s = dequote("\"The small man\"")
-- 's' now contains "The small man"
Example 2:
-- Using the defaults.
s = dequote("(The small ?(?) man)", {{"(",")"}}, '?')
-- 's' now contains "The small () man"

8.21.1.15 format

include std/text.e
namespace text
public function format(sequence format_pattern, object arg_list = {})

Formats a set of arguments in to a string based on a supplied pattern.

Parameters:
  1. format_pattern : A sequence: the pattern string that contains zero or more tokens.
  2. arg_list : An object: Zero or more arguments used in token replacement.
Returns:

A string sequence, the original format_pattern but with tokens replaced by corresponding arguments.

Comments:

The format_pattern string contains text and argument tokens. The resulting string is the same as the format string except that each token is replaced by an item from the argument list.

A token has the form [<Q>], where <Q> is are optional qualifier codes.

The qualifier. <Q> is a set of zero or more codes that modify the default way that the argument is used to replace the token. The default replacement method is to convert the argument to its shortest string representation and use that to replace the token. This may be modified by the following codes, which can occur in any order.

Qualifier Usage
N ('N' is an integer) The index of the argument to use
{id} Uses the argument that begins with "id=" where "id"
is an identifier name.
%envvar% Uses the Environment Symbol 'envar' as an argument
w For string arguments, if capitalizes the first
letter in each word
u For string arguments, it converts it to upper case.
l For string arguments, it converts it to lower case.
< For numeric arguments, it left justifies it.
> For string arguments, it right justifies it.
c Centers the argument.
z For numbers, it zero fills the left side.
:S ('S' is an integer) The maximum size of the
resulting field. Also, if 'S' begins with '0' the
field will be zero-filled if the argument is an integer
.N ('N' is an integer) The number of digits after
the decimal point
+ For positive numbers, show a leading plus sign
( For negative numbers, enclose them in parentheses
b For numbers, causes zero to be all blanks
s If the resulting field would otherwise be zero
length, this ensures that at least one space occurs
between this token's field
t After token replacement, the resulting string up to this point is trimmed.
X Outputs integer arguments using hexadecimal digits.
B Outputs integer arguments using binary digits.
? The corresponding argument is a set of two strings. This
uses the first string if the previous token's argument is
not the value 1 or a zero-length string, otherwise it
uses the second string.
[ Does not use any argument. Outputs a left-square-bracket symbol
,X Insert thousands separators. The <X> is the character
to use. If this is a dot "." then the decimal point
is rendered using a comma. Does not apply to zero-filled
fields.
N.B. if hex or binary output was specified, the
separators are every 4 digits otherwise they are
every three digits.
T If the argument is a number it is output as a text character,
otherwise it is output as text string

Clearly, certain combinations of these qualifier codes do not make sense and in those situations, the rightmost clashing code is used and the others are ignored.

Any tokens in the format that have no corresponding argument are simply removed from the result. Any arguments that are not used in the result are ignored.

Any sequence argument that is not a string will be converted to its pretty format before being used in token replacement.

If a token is going to be replaced by a zero-length argument, all white space following the token until the next non-whitespace character is not copied to the result string.

Examples:
format("Cannot open file '[]' - code []", {"/usr/temp/work.dat", 32})
-- "Cannot open file '/usr/temp/work.dat' - code 32"

format("Err-[2], Cannot open file '[1]'", {"/usr/temp/work.dat", 32})
-- "Err-32, Cannot open file '/usr/temp/work.dat'"

format("[4w] [3z:2] [6] [5l] [2z:2], [1:4]", {2009,4,21,"DAY","MONTH","of"})
-- "Day 21 of month 04, 2009"

format("The answer is [:6.2]%", {35.22341})
-- "The answer is  35.22%"

format("The answer is [.6]", {1.2345})
-- "The answer is 1.234500"

format("The answer is [,,.2]", {1234.56})
-- "The answer is 1,234.56"

format("The answer is [,..2]", {1234.56})
-- "The answer is 1.234,56"

format("The answer is [,:.2]", {1234.56})
-- "The answer is 1:234.56"

format("[] [?]", {5, {"cats", "cat"}})
-- "5 cats"

format("[] [?]", {1, {"cats", "cat"}})
-- "1 cat"

format("[<:4]", {"abcdef"})
-- "abcd"

format("[>:4]", {"abcdef"})
-- "cdef"

format("[>:8]", {"abcdef"})
-- "  abcdef"

format("seq is []", {{1.2, 5, "abcdef", {3}}})
-- `seq is {1.2,5,"abcdef",{3}}`

format("Today is [{day}], the [{date}]", {"date=10/Oct/2012", "day=Wednesday"})
-- "Today is Wednesday, the 10/Oct/2012"

format("'A' is [T]", 65)
-- `'A' is A`
See Also:

sprintf

8.21.1.16 wrap

include std/text.e
namespace text
public function wrap(sequence content, integer width = 78, sequence wrap_with = "\n",
        sequence wrap_at = " \t")

Wrap text

Parameters:
  • content - sequence content to wrap
  • width - width to wrap at, defaults to 78
  • wrap_with - sequence to wrap with, defaults to "\n"
  • wrap_at - sequence of characters to wrap at, defaults to space and tab
Returns:

Sequence containing wrapped text

Example 1:
sequence result = wrap("Hello, World")
-- result = "Hello, World"
Example 2:
sequence msg = "Hello, World. Today we are going to learn about apples."
sequence result = wrap(msg, 40)
-- result =
--   "Hello, World. today we are going to\n"
--   "learn about apples."
Example 3:
sequence msg = "Hello, World. Today we are going to learn about apples."
sequence result = wrap(msg, 40, "\n    ")
-- result =
--   "Hello, World. today we are going to\n"
--   "    learn about apples."
Example 4:
sequence msg = "Hello, World. This, Is, A, Dummy, Sentence, Ok, World?"
sequence result = wrap(msg, 30, "\n", ",")
-- result = 
--   "Hello, World. This, Is, A,"
--   "Dummy, Sentence, Ok, World?"