1. [clickbait] Slowest oe/p program

A shocking discovery...Python dictionary works really well!

Read about 130_000 lines whose contents looks like:


CMU Pronouncing Dictionary. You can download it from http://www.speech.cs.cmu.edu/cgi-bin/cmudict or from http://thinkPHIX2.com/code/c06d

The idea is to:

  • read the c06d file
  • make the key lowercase (like ABACUS --> abacus)
  • make the value (like AE1 B AH0 K AH0 S )
  • put this into a dictionary


Python dictionary is really fast.

`tables` is by Jiri Babor (a clever use of sequences)

`map` is using OE map.e hash

`tree` is a Phix tree

Benchmarking is simplistic:

atom t = time() 
{} = system_exec( "p64 tables.ex" ) 
? time() - t 

Mint19 64bit i5

More seconds is bad.

time (seconds)
python.py python3 1.6 !
map.ex eui 2.7
map euc 1.9
tables.ex eui 1.6
oe_tables euc 1.5 *
tables.ex p64 1.6
p64_tables p64 -c 1.55
tree.ex p64 6.5
tree p64 -c 6.47 ?

be well

Python Code

"""This module contains a code example related to 

Think Python, 2nd Edition 
by Allen Downey 
Copyright 2015 Allen Downey 
License: http://creativecommons.org/licenses/by/4.0/ 

from __future__ import print_function, division 
def read_dictionary(filename='c06d'): 
    """Reads from a file and builds a dictionary that maps from 

    each word to a string that describes its primary pronunciation. 
    Secondary pronunciations are added to the dictionary with 
    a number, in parentheses, at the end of the key, so the 
    key for the second pronunciation of "abdominal" is "abdominal(2)". 
    filename: string 
    returns: map from string to pronunciation 

    d = dict() 
    fin = open(filename) 
    for line in fin: 
        # skip over the comments 
        if line[0] == '#': continue 
        t = line.split() 
        word = t[0].lower() 
        pron = ' '.join(t[1:]) 
        d[word] = pron 
    return d 
if __name__ == '__main__': 
    d = read_dictionary() 
    for k, v in d.items(): 
        print(k, v) 

Phix Dictionary

atom fn = open( "c06d", "r") 
sequence raw = get_text(fn, 1 ) 
integer d = new_dict() 
for i=1 to length(raw) do 
    if raw[i][1] == '#' then continue end if 
    raw[i] = split(raw[i]) 
    putd( lower(raw[i][1]), join( raw[i][2..$], ' ') ) 
end for 
        function show( object key,data,user) 
            printf(1, "%s  %s \n", {key,data} ) 
            return 1 
            end function 
traverse_dict( routine_id("show") ) 

OE Map

atom fn = open( "c06d", "r" ) 
sequence raw = read_lines(fn) 
include std/map.e 
include std/io.e 
include std/search.e 
include std/sequence.e 
include std/text.e 
map d = new() 
for i=1 to length(raw) do 
    if raw[i][1] = '#' then continue end if 
    integer n = find( ' ', raw[i] ) 
    put(d,  lower(raw[i][1..n-1]), raw[i][n+1..$] ) 
end for 
sequence foo = pairs( d, 1 ) 
for i=1 to length(foo) do 
    printf(1, "%s  %s\n", { foo[i][1], foo[i][2] } ) 
end for 

Babor Table

I cheat in creating the `table` without using stables.e functions.

include stables.e 
ifdef PHIX then 
    atom fn = open( "c06d", "r") 
    sequence raw = get_text(fn, 1 ) 
        ? length(raw) 
    include std/sequence.e 
    include std/text.e 
    include std/io.e 
    sequence raw = read_lines("c06d") 
    ? length(raw) 
end ifdef 
sequence d = ET 
sequence data={}, keys={} 
for i=1 to length(raw) do 
    if raw[i][1] = '#' then continue end if 
    integer n = find(' ', raw[i]) 
    keys = append(keys, lower(raw[i][1..n-1]) ) 
    data = append(data, raw[i][n+1..$] ) 
end for 
d = append(data, keys) 
for i=1 to length(d)-1 do 
        printf(1,"%s  %s \n", { d[$][i], d[i] } ) 
end for 

... continues on reply ...

new topic     » topic index » view message » categorize

2. Re: [clickbait] Slowest oe/p program

J Babors's table code is in the pastebin


or, get it from the archive

new topic     » goto parent     » topic index » view message » categorize


Quick Links

User menu

Not signed in.

Misc Menu