Re: Constructing sequences

new topic     » goto parent     » topic index » view thread      » older message » newer message

Maybe I’m digressing from the point of the original post but, for me, it
raises the question, what are the general rules for fast loading and saving
of very complex or huge structures and what level of sub-sequences can be
addressed efficiently?

Here’s an example with pre allocation (structure only) and appending to a
subscripted variable which is only twice as slow at loading from file as
Robert’s optimised routine loading to a simple buffer sequence (it’s part of
a simple spell checker I use).

For speed, it appears that pre-defining the structure before reading and
breaking the structure down into simple units before saving is important.

include file.e
include misc.e
atom handle

-- create pseudo dictionary file - legible version of Robert's code
handle = open("bigfile.txt", "w")
for i = 1 to 100000 do
puts(handle, 'a' + rand(repeat(25, 19 + rand(11))))
puts(handle, '\n')
end for
close(handle)

--key for pseudo dictionary hash - code intended for proper lower case words
function dic_hash_key(sequence key_word)
integer i, j, k
i = key_word[1] - 96
if length(key_word) > 2 then--most tested at this stage
j = key_word[2] - 96
k = key_word[3] - 96
return {i, j, k}
elsif length(key_word) > 1 then--very few tested at or after this stage
j = key_word[2] - 96
return {i, j, 26}
else return {i, 26, 26}
end if
end function

--read
function read_list_to_hash(sequence file)
object filed_line
sequence hash, key
handle = open(file, "r")
hash = repeat(repeat(repeat({}, 26), 26), 26)
while 1 do
filed_line = gets(handle)
if sequence(filed_line) then
filed_line = filed_line[1..length(filed_line) -1]
key = dic_hash_key(filed_line)
hash[key[1]][key[2]][key[3]]=append(hash[key[1]][key[2]][key[3]],filed_line)
else exit
end if
end while
close(handle)
return hash
end function

--save
procedure save_hash_to_list(sequence file, sequence hash)
handle = open(current_dir() & file, "w")
for h = 1 to length(hash) do
for i = 1 to length(hash[h]) do
for j = 1 to length(hash[h][i]) do
for k = 1 to length(hash[h][i][j]) do
puts(handle, hash[h][i][j][k] & 10)
end for
end for
end for
end for
close(handle)
end procedure

sequence dictionary_hash
atom t

--load dictionary
t = time()
dictionary_hash = read_list_to_hash("bigfile.txt")
?time()-t

--save dictionary
t = time()
save_hash_to_list("bigfile.txt", dictionary_hash)
?time()-t
sleep(3)

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu