1. Searching for Data in EDS

Let's say I have records set up like this:

key: X, data: Y

where X and Y are integers (atoms).

How would I find all records with Y = 2?

Do I just have to loop through all the records? What if I have a million
records?!

Elsewhere, anybody already have good code for managing EDS files to handle
cases like this (a function that returns a sequence with the record numbers
of all records where Y = 2)...?

Thanks!
ck

new topic     » topic index » view message » categorize

2. Re: Searching for Data in EDS

On Monday 11 November 2002 03:27 pm, you wrote:
>
> > From: C. K. Lester [mailto:cklester at yahoo.com]
> >
> > Let's say I have records set up like this:
> >
> > key: X, data: Y
> >
> > where X and Y are integers (atoms).
> >
> > How would I find all records with Y = 2?
> >
> > Do I just have to loop through all the records? What if I
> > have a million
> > records?!
> >
> > Elsewhere, anybody already have good code for managing EDS
> > files to handle
> > cases like this (a function that returns a sequence with the
> > record numbers
> > of all records where Y = 2)...?
>
> Your best bet is to use indices (there was a fairly in depth treatment on
> this topic previously--I believe the question was regarding record albums,
> and I think Irv was the main responder), where you keep a separate
> table/record to remember all the keys which have certain properties.

If you are likely to have duplicate Y values, then you might save lookup time 
by creating a unique value index.

For example, given the following records:
key val
1     99
2    102
3     99
4     25
5     99

You can build an index like this:
key val
25   {4}
99   {1,3,5}
102  {2}

So looking up a given value in the index is very fast (binary lookup?) and 
then looking up the individual records which contain that value is simply a 
matter of iterating thru the short list. 

Whether this is really practical depend on how many of your values are actual 
dupes, how often the database is updated, and whether there is a lot of 
extra data that goes with each record.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

3. Re: Searching for Data in EDS

Irv, do you save the index to the database?

I use key -1 as the field name record. I'm thinking to use -2 or somesuch
negative value as the index. Is it proper to store the index in the database
file itself, or should that be kept separate? I wouldn't think one would
want to regen the index every time it's opened...

new topic     » goto parent     » topic index » view message » categorize

4. Re: Searching for Data in EDS

On Monday 11 November 2002 05:49 pm, you wrote:
>
> Irv, do you save the index to the database?
>
> I use key -1 as the field name record. I'm thinking to use -2 or somesuch
> negative value as the index. Is it proper to store the index in the
> database file itself, or should that be kept separate? I wouldn't think one
> would want to regen the index every time it's opened...

I store the indices as tables in the database. For a more realistic example, 
suppose I have an employee database, and want quick access by lastname,
city, state, zipcode, work location.

My data would be stored in the main table with a unique key (ssn), and I 
would have additional tables for name_idx, city_idx, state_idx, zip_idx, 
loc_idx.

Since EDS has built-in protection against storing dupe keys, I can use very 
simple code to store the index. For example, to create zip_idx, I just read 
the main data table sequentially, extract each zipcode, and use that as a key 
into the zip_idx table. 

If a record with that key doesn't exist, I can add a record:
key = zip
data = {ssn}

if that key already exists, I can load that record, append the ssn to the 
data, and store it back, so the record now looks like:
key = zip
data = {ssn1, ssn2,....}

So you can see that getting all employees in a given zip code, for example,  
is dead simple and quick. 
1. from the zip_idx table, lookup the record with the key (zip)
2. for each ssn in the retrieved data, read the record from the main table 
using the key (ssn)
Tiny fraction of a second, and you're done.

Regenerating the indexes each time the db is opened is not necessary, but 
you do need to write code to update all indexes with each addttion, deletion 
or change. But (based on experience) you should also write re-indexing 
routines, they will come in handy. 

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu