8.27 Statistics

8.27.1 Routines

8.27.1.1 small

include std/stats.e
namespace stats
public function small(sequence data_set, integer ordinal_idx)

Determines the k-th smallest value from the supplied set of numbers.

Parameters:
  1. data_set : The list of values from which the smallest value is chosen.
  2. ordinal_idx : The relative index of the desired smallest value.
Returns:

A sequence, {The k-th smallest value, its index in the set}

Comments:

small() is used to return a value based on its size relative to all the other elements in the sequence. When index is 1, the smallest index is returned. Use index = length(data_set) to return the highest.

If ordinal_idx is less than one, or greater then length of data_set, an empty sequence is returned.

The set of values does not have to be in any particular order. The values may be any Euphoria object.

Example 1:
small( {4,5,6,8,5,4,3,"text"}, 3 )
--> Ans: {4,1} (The 3rd smallest value)
small( {4,5,6,8,5,4,3,"text"}, 1 ) 
--> Ans: {3,7} (The 1st smallest value)
small( {4,5,6,8,5,4,3,"text"}, 7 ) 
--> Ans: {8,4} (The 7th smallest value)
small( {"def", "qwe", "abc", "try"}, 2 ) 
--> Ans: {"def", 1} (The 2nd smallest value)
small( {1,2,3,4}, -1) 
--> Ans: {} -- no-value
small( {1,2,3,4}, 10) 
--> Ans: {} -- no-value

8.27.1.2 largest

include std/stats.e
namespace stats
public function largest(object data_set)

Returns the largest of the data points that are atoms.

Parameters:
  1. data_set : a list of 1 or more numbers among which you want the largest.
Returns:

An object, either of:

  • an atom (the largest value) if there is at least one atom item in the set
  • {} if there is no largest value.
Comments:

Any data_set element which is not an atom is ignored.

Example 1:
largest( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"} ) -- Ans: 8
largest( {"just","text"} ) -- Ans: {}
See also:

range

8.27.1.3 smallest

include std/stats.e
namespace stats
public function smallest(object data_set)

Returns the smallest of the data points.

Parameters:
  1. data_set : A list of 1 or more numbers for which you want the smallest. Note: only atom elements are included and any sub-sequences elements are ignored.
Returns:

An object, either of:

  • an atom (the smallest value) if there is at least one atom item in the set
  • {} if there is no largest value.
Comments:

Any data_set element which is not an atom is ignored.

Example 1:
? smallest( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"} ) -- Ans: 1
? smallest( {"just","text"} ) -- Ans: {}
See also:

range

8.27.1.4 range

include std/stats.e
namespace stats
public function range(object data_set)

Determines a number of range statistics for the data set.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the range data.
Returns:

A sequence, empty if no atoms were found, else like {Lowest, Highest, Range, Mid-range}

Comments:

Any sequence element in data_set is ignored.

Example 1:
? range( {7,2,8,5,6,6,4,8,6,16,3,3,4,1,8,"text"} ) -- Ans: {1, 16, 15, 8.5}
See also:

smallest largest

Enums used to influence the results of some of these functions.

8.27.1.5 ST_FULLPOP

include std/stats.e
namespace stats
public enum ST_FULLPOP

The supplied data is the entire population.

8.27.1.6 ST_SAMPLE

include std/stats.e
namespace stats
public enum ST_SAMPLE

The supplied data is only a random sample of the population.

8.27.1.7 ST_ALLNUM

include std/stats.e
namespace stats
public enum ST_ALLNUM

The supplied data consists of only atoms.

8.27.1.8 ST_IGNSTR

include std/stats.e
namespace stats
public enum ST_IGNSTR

Any sub-sequences (eg. strings) in the supplied data are ignored.

8.27.1.9 ST_ZEROSTR

include std/stats.e
namespace stats
public enum ST_ZEROSTR

Any sub-sequences (eg. strings) in the supplied data are assumed to have the value zero.

8.27.1.10 stdev

include std/stats.e
namespace stats
public function stdev(sequence data_set, object subseq_opt = ST_ALLNUM,
        integer population_type = ST_SAMPLE)

Returns the standard deviation based of the population.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the estimated standard deviation.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
  3. population_type : an integer. ST_SAMPLE (the default) assumes that data_set is a random sample of the total population. ST_FULLPOP means that data_set is the entire population.
Returns:

An atom, the estimated standard deviation. An empty sequence means that there is no meaningful data to calculate from.

Comments:

stdev() is a measure of how values are different from the average.

The numbers in data_set can either be the entire population of values or just a random subset. You indicate which in the population_type parameter. By default data_set represents a sample and not the entire population. When using this function with sample data, the result is an estimated standard deviation.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

The equation for standard deviation is:

stdev(X) ==> SQRT(SUM(SQ(X{1..N} - MEAN)) / (N))
Example 1:
? stdev( {4,5,6,7,5,4,3,7} )                             -- Ans: 1.457737974
? stdev( {4,5,6,7,5,4,3,7} ,, ST_FULLPOP)                -- Ans: 1.363589014
? stdev( {4,5,6,7,5,4,3,"text"} , ST_IGNSTR)             -- Ans: 1.345185418
? stdev( {4,5,6,7,5,4,3,"text"}, ST_IGNSTR, ST_FULLPOP ) -- Ans: 1.245399698
? stdev( {4,5,6,7,5,4,3,"text"} , 0)                     -- Ans: 2.121320344
? stdev( {4,5,6,7,5,4,3,"text"}, 0, ST_FULLPOP )         -- Ans: 1.984313483
See also:

average, avedev

8.27.1.11 avedev

include std/stats.e
namespace stats
public function avedev(sequence data_set, object subseq_opt = ST_ALLNUM,
        integer population_type = ST_SAMPLE)

Returns the average of the absolute deviations of data points from their mean.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the mean of the absolute deviations.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
  3. population_type : an integer. ST_SAMPLE (the default) assumes that data_set is a random sample of the total population. ST_FULLPOP means that data_set is the entire population.
Returns:

An atom , the deviation from the mean.
An empty sequence, means that there is no meaningful data to calculate from.

Comments:

avedev() is a measure of the variability in a data set. Its statistical properties are less well behaved than those of the standard deviation, which is why it is used less.

The numbers in data_set can either be the entire population of values or just a random subset. You indicate which in the population_type parameter. By default data_set represents a sample and not the entire population. When using this function with sample data, the result is an estimated deviation.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

The equation for absolute average deviation is:

avedev(X) ==> SUM( ABS(X{1..N} - MEAN(X)) ) / N
Example 1:
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,7} )
   --> Ans: 1.966666667
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,7},, ST_FULLPOP ) 
   --> Ans: 1.84375
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"}, ST_IGNSTR  ) 
   --> Ans: 1.99047619
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"}, ST_IGNSTR,ST_FULLPOP ) 
   --> Ans: 1.857777778
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"}, 0 ) 
    --> Ans: 2.225
? avedev( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"}, 0, ST_FULLPOP ) 
   --> Ans: 2.0859375
See also:

average, stdev

8.27.1.12 sum

include std/stats.e
namespace stats
public function sum(object data_set, object subseq_opt = ST_ALLNUM)

Returns the sum of all the atoms in an object.

Parameters:
  1. data_set : Either an atom or a list of numbers to sum.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An atom, the sum of the set.

Comments:

sum() is used as a measure of the magnitude of a sequence of positive values.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

The equation is:

sum(X) ==> SUM( X{1..N} )
Example 1:
? sum( {7,2,8.5,6,6,-4.8,6,6,3.341,-8,"text"}, 0 ) -- Ans: 32.041
See also:

average

8.27.1.13 count

include std/stats.e
namespace stats
public function count(object data_set, object subseq_opt = ST_ALLNUM)

Returns the count of all the atoms in an object.

Parameters:
  1. data_set : either an atom or a list.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Comments:

This returns the number of numbers in data_set

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Returns:

An integer, the number of atoms in the set. When data_set is an atom, 1 is returned.

Example 1:
? count( {7,2,8.5,6,6,-4.8,6,6,3.341,-8,"text"} ) -- Ans: 10
? count( {"cat", "dog", "lamb", "cow", "rabbit"} ) -- Ans: 0 (no atoms)
? count( 5 ) -- Ans: 1
See also:

average, sum

8.27.1.14 average

include std/stats.e
namespace stats
public function average(object data_set, object subseq_opt = ST_ALLNUM)

Returns the average (mean) of the data points.

Parameters:
  1. data_set : A list of 1 or more numbers for which you want the mean.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An object,

  • {} (the empty sequence) if there are no atoms in the set.
  • an atom (the mean) if there are one or more atoms in the set.
Comments:

average() is the theoretical probable value of a randomly selected item from the set.

The equation for average is:

average(X) ==> SUM( X{1..N} ) / N

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
? average( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,"text"}, ST_IGNSTR ) -- Ans: 5.13333333
See also:

geomean, harmean, movavg, emovavg

8.27.1.15 geomean

include std/stats.e
namespace stats
public function geomean(object data_set, object subseq_opt = ST_ALLNUM)

Returns the geometric mean of the atoms in a sequence.

Parameters:
  1. data_set : the values to take the geometric mean of.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An atom, the geometric mean of the atoms in data_set. If there is no atom to take the mean of, 1 is returned.

Comments:

The geometric mean of N atoms is the N-th root of their product. Signs are ignored.

This is useful to compute average growth rates.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
? geomean({3, "abc", -2, 6}, ST_IGNSTR) -- prints out power(36,1/3) = 3,30192724889462669
? geomean({1,2,3,4,5,6,7,8,9,10}) -- = 4.528728688
See Also:

average

8.27.1.16 harmean

include std/stats.e
namespace stats
public function harmean(sequence data_set, object subseq_opt = ST_ALLNUM)

Returns the harmonic mean of the atoms in a sequence.

Parameters:
  1. data_set : the values to take the harmonic mean of.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An atom, the harmonic mean of the atoms in data_set.

Comments:

The harmonic mean is the inverse of the average of their inverses.

This is useful in engineering to compute equivalent capacities and resistances.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
? harmean({3, "abc", -2, 6}, ST_IGNSTR) -- =  0.
? harmean({{2, 3, 4}) -- 3 / (1/2 + 1/3 + 1/4) = 2.769230769
See Also:

average

8.27.1.17 movavg

include std/stats.e
namespace stats
public function movavg(object data_set, object period_delta)

Returns the average (mean) of the data points for overlaping periods. This can be either a simple or weighted moving average.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want a moving average.
  2. period_delta : an object, either
  • an integer representing the size of the period, or
  • a list of weightings to apply to the respective period positions.
Returns:

A sequence, either the requested averages or {} if the Data sequence is empty or the supplied period is less than one.

If a list of weights was supplied, the result is a weighted average; otherwise, it is a simple average.

Comments:

A moving average is used to smooth out a set of data points over a period.
For example, given a period of 5:

  1. the first returned element is the average of the first five data points [1..5],
  2. the second returned element is the average of the second five data points [2..6],
    and so on
    until the last returned value is the average of the last 5 data points [$-4 .. $].

When period_delta is an atom, it is rounded down to the width of the average. When it is a sequence, the width is its length. If there are not enough data points, zeroes are inserted.

Note that only atom elements are included and any sub-sequence elements are ignored.

Example 1:
? movavg( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8}, 10 )
 -- Ans: {5.8, 5.4, 5.5, 5.1, 4.7, 4.9}
? movavg( {7,2,8,5,6}, 2 ) 
 -- Ans: {4.5, 5, 6.5, 5.5}
? movavg( {7,2,8,5,6}, {0.5, 1.5} ) 
 -- Ans: {3.25, 6.5, 5.75, 5.75}
See also:

average

8.27.1.18 emovavg

include std/stats.e
namespace stats
public function emovavg(object data_set, atom smoothing_factor)

Returns the exponential moving average of a set of data points.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want a moving average.
  2. smoothing_factor : an atom, the smoothing factor, typically between 0 and 1.
Returns:

A sequence, made of the requested averages, or {} if data_set is empty or the supplied period is less than one.

Comments:

A moving average is used to smooth out a set of data points over a period.

The formula used is:

Yi = Yi-1 + F * (Xi - Yi-1)

Note that only atom elements are included and any sub-sequences elements are ignored.

The smoothing factor controls how data is smoothed. 0 smooths everything to 0, and 1 means no smoothing at all.

Any value for smoothing_factor outside the 0.0..1.0 range causes smoothing_factor to be set to the periodic factor (2/(N+1)).

Example 1:
? emovavg( {7,2,8,5,6}, 0.75 )
 -- Ans: {6.65,3.1625,6.790625,5.44765625,5.861914063}
? emovavg( {7,2,8,5,6}, 0.25 ) 
 -- Ans: {5.95,4.9625,5.721875,5.54140625,5.656054687}
? emovavg( {7,2,8,5,6}, -1 ) 
 -- Ans: {6.066666667,4.711111111,5.807407407,5.538271605,5.69218107}
See also:

average

8.27.1.19 median

include std/stats.e
namespace stats
public function median(object data_set, object subseq_opt = ST_ALLNUM)

Returns the mid point of the data points.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the mean.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An object, either {} if there are no items in the set, or an atom (the median) otherwise.

Comments:

median() is the item for which half the items are below it and half are above it.

All elements are included; any sequence elements are assumed to have the value zero.

The equation for average is:

median(X) ==> sort(X)[N/2]

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
? median( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,4} ) -- Ans: 5
See also:

average, geomean, harmean, movavg, emovavg

8.27.1.20 raw_frequency

include std/stats.e
namespace stats
public function raw_frequency(object data_set, object subseq_opt = ST_ALLNUM)

Returns the frequency of each unique item in the data set.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the frequencies.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

A sequence. This will contain zero or more 2-element sub-sequences. The first element is the frequency count and the second element is the data item that was counted. The returned values are in descending order, meaning that the highest frequencies are at the beginning of the returned list.

Comments:

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
? raw_frequency("the cat is the hatter")

This returns

{
{5,116},
{4,32},
{3,104},
{3,101},
{2,97},
{1,115},
{1,114},
{1,105},
{1,99}
}

8.27.1.21 mode

include std/stats.e
namespace stats
public function mode(sequence data_set, object subseq_opt = ST_ALLNUM)

Returns the most frequent point(s) of the data set.

Parameters:
  1. data_set : a list of 1 or more numbers for which you want the mode.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

A sequence. The list of modal items in the data set.

Comments:

It is possible for the mode() to return more than one item when more than one item in the set has the same highest frequency count.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
mode( {7,2,8,5,6,6,4,8,6,6,3,3,4,1,8,4} ) -- Ans: {6}
mode( {8,2,8,5,6,6,4,8,6,6,3,3,4,1,8,4} ) -- Ans: {8,6}
See also:

average, geomean, harmean, movavg, emovavg

8.27.1.22 central_moment

include std/stats.e
namespace stats
public function central_moment(sequence data_set, object datum, integer order_mag = 1,
        object subseq_opt = ST_ALLNUM)

Returns the distance between a supplied value and the mean, to some supplied order of magnitude. This is used to get a measure of the shape of a data set.

Parameters:
  1. data_set : a list of 1 or more numbers whose mean is used.
  2. datum: either a single value or a list of values for which you require the central moments.
  3. order_mag: An integer. This is the order of magnitude required. Usually a number from 1 to 4, but can be anything.
  4. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An object. The same data type as datum. This is the set of calculated central moments.

Comments:

For each of the items in datum, its central moment is calculated as ...

CM = power( ITEM - AVG, MAGNITUDE)

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
central_moment("the cat is the hatter", "the",1) --> {23.14285714, 11.14285714, 8.142857143}
central_moment("the cat is the hatter", 't',2) -->   535.5918367                          
central_moment("the cat is the hatter", 't',3) -->   12395.12536                          
See also:

average

8.27.1.23 sum_central_moments

include std/stats.e
namespace stats
public function sum_central_moments(object data_set, integer order_mag = 1,
        object subseq_opt = ST_ALLNUM)

Returns sum of the central moments of each item in a data set.

Parameters:
  1. data_set : a list of 1 or more numbers whose mean is used.
  2. order_mag: An integer. This is the order of magnitude required. Usually a number from 1 to 4, but can be anything.
  3. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An atom. The total of the central moments calculated for each of the items in data_set.

Comments:

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
sum_central_moments("the cat is the hatter", 1) --> -8.526512829e-14
sum_central_moments("the cat is the hatter", 2) --> 19220.57143     
sum_central_moments("the cat is the hatter", 3) --> -811341.551     
sum_central_moments("the cat is the hatter", 4) --> 56824083.71
See also:

central_moment, average

8.27.1.24 skewness

include std/stats.e
namespace stats
public function skewness(object data_set, object subseq_opt = ST_ALLNUM)

Returns a measure of the asymmetry of a data set. Usually the data_set is a probablity distribution but it can be anything. This value is used to assess how suitable the data set is in representing the required analysis. It can help detect if there are too many extreme values in the data set.

Parameters:
  1. data_set : a list of 1 or more numbers whose mean is used.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An atom. The skewness measure of the data set.

Comments:

Generally speaking, a negative return indicates that most of the values are lower than the mean, while positive values indicate that most values are greater than the mean. However this might not be the case when there are a few extreme values on one side of the mean.

The larger the magnitude of the returned value, the more the data is skewed in that direction.

A returned value of zero indicates that the mean and median values are identical and that the data is symmetrical.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
skewness("the cat is the hatter") --> -1.36166186
skewness("thecatisthehatter")     --> 0.1093730315
See also:

kurtosis

8.27.1.25 kurtosis

include std/stats.e
namespace stats
public function kurtosis(object data_set, object subseq_opt = ST_ALLNUM)

Returns a measure of the spread of values in a dataset when compared to a normal probability curve.

Parameters:
  1. data_set : a list of 1 or more numbers whose kurtosis is required.
  2. subseq_opt : an object. When this is ST_ALLNUM (the default) it means that data_set is assumed to contain no sub-sequences otherwise this gives instructions about how to treat sub-sequences. See comments for details.
Returns:

An object. If this is an atom it is the kurtosis measure of the data set. Othewise it is a sequence containing an error integer. The return value {0} indicates that an empty dataset was passed, {1} indicates that the standard deviation is zero (all values are the same).

Comments:

Generally speaking, a negative return indicates that most of the values are further from the mean, while positive values indicate that most values are nearer to the mean.

The larger the magnitude of the returned value, the more the data is 'peaked' or 'flatter' in that direction.

If the data can contain sub-sequences, such as strings, you need to let the the function know about this otherwise it assumes every value in data_set is an number. If that is not the case then the function will crash. So it is important that if it can possibly contain sub-sequences that you tell this function what to do with them. Your choices are to ignore them or assume they have the value zero. To ignore them, use ST_IGNSTR as the subseq_opt parameter value otherwise use ST_ZEROSTR. However, if you know that data_set only contains numbers use the default subseq_opt value, ST_ALLNUM. Note It is faster if the data only contains numbers.

Example 1:
kurtosis("thecatisthehatter")     --> -1.737889192
See also:

skewness