Re: better flatten()?
- Posted by ghaberek (admin) Jul 27, 2015
- 1546 views
I noticed a peculiar example in the documentation for flatten:
Example 3:
Using the delimiter argument. s = flatten({"abc", "def", "ghi"}, ", ") -- s is "abc, def, ghi"
Pete's flatten2() accomplishes this correctly, but Spock's flattenX() does not. flattenX() also leaves a trailing delimiter which should not be part of the correct output.
s = flatten2({"abc", "def", "ghi"}, ", ") -- s is "abc, def, ghi" s = flattenX({"abc", "def", "ghi"}, ", ") -- s is "a, b, c, d, e, f, g, h, i, "
However, taking this a step further, I noticed that flatten() and flatten2() do not seem to handle nested strings correctly. Notice how the nested strings get merged together:
s = flatten({"abc", "def", "ghi", {"jkl", "mno", "pqr"}, "stu", "vwx", "yz"}, ", ") -- s is "abc, def, ghi, jklmnopqr, stu, vwx, yz" s = flatten2({"abc", "def", "ghi", {"jkl", "mno", "pqr"}, "stu", "vwx", "yz"}, ", ") -- s is "abc, def, ghi, jklmnopqr, stu, vwx, yz"
So I made my own attempt to better handle this behavior and I wrote two functions: flatten_all() and flatten_seq().
-- -- string type borrowed from std/types.e -- type string( object x ) if not sequence(x) then return 0 end if for i = 1 to length(x) do if not integer(x[i]) then return 0 end if if x[i] < 0 then return 0 end if if x[i] > 255 then return 0 end if end for return 1 end type -- -- an array of only string objects -- type string_array( object x ) if atom( x ) then return 0 end if for i = 1 to length( x ) do if not string( x[i] ) then return 0 end if end for return 1 end type -- -- flatten a sequence into its raw atoms -- function flatten_all( object s1, object delim = "" ) if atom( s1 ) then return {s1} end if sequence s2 = {} for i = 1 to length( s1 ) do s2 &= flatten_all( s1[i] ) end for return join( s2, delim ) end function -- -- flatten a sequence, preserving nested strings -- function flatten_seq( sequence s1, object delim = "" ) sequence s2 = {} for i = 1 to length( s1 ) do if string( s1[i] ) then -- append string item s2 &= {s1[i]} elsif string_array( s1[i] ) then -- append the whole array s2 &= s1[i] else -- append the raw atoms s2 &= flatten_all( s1[i] ) end if end for return join( s2, delim ) end function
s = flatten_all({"abc", "def", "ghi", {"jkl", "mno", "pqr"}, "stu", "vwx", "yz"}, ", ") -- s is "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z" s = flatten_seq({"abc", "def", "ghi", {"jkl", "mno", "pqr"}, "stu", "vwx", "yz"}, ", ") -- s is "abc, def, ghi, jkl, mno, pqr, stu, vwx, yz"
These are nearly as fast as flatten2() or flattenX() and I believe they produce the most "correct" output thus far.
I did some testing on large random sequences and flatten_seq() seems to be just as fast as flatten2(), while flatten_all() is about 3-5 times slower.
I'll admit I'm probably taking a performance hit by using join() but it made the code come out a lot cleaner.
-Greg