1. Robert...EDS - questions/comments

Robert,

I have some questions/comments concerning the EDS:

1. Say I have a table with a key structure of : {atom, atom}.  I insert 
3 records with the following key values:  {1 , 1} {1, 2} {1, 3}.  I add 
another 10,00 entries with other key values.  I then add {1, 4} {1, 5}.  
Now I want to get all records with the first key value of 1.  Assuming I 
use db_find_key and find and read the first three records, would I have 
to read through the next 10,000 to get the last two records?  If I 
checked for the first key value to change and then drop out of my read 
loop I would think I would miss the last two records for key value of 1. 
 Is this right?  If so, would it be possible to add a function to 
"retrieve_next_record_by_partial_key"?  I suppose I could increment the 
second key value by 1 and use db_find_key but what if the second key 
value was not sequential?
2. Somewhat relating to question 1, if I used db_find_key with a key 
value of {1, 0} (assuming there will never be a record with a second key 
value of 0) I would get a negative result.  If I am reading the docs 
right, the negtaive number would be the record number if it were 
inserted into the file.  For example, if the db_find_key returned -4, 
would this mean that if I were to retrieve record 4, would it have a key 
value of {1, 1} (assuming key value {1, 1} is in the file)?
3. Let me say that I really like the simplicity and ease-of-use with the 
EDS.  I have written programs using direct calls to Borland's Database 
Engine and it's not pretty.
Jonas

new topic     » topic index » view message » categorize

2. Re: Robert...EDS - questions/comments

--- Jonas  Temple <jktemple at yhti.net> wrote:

> 1. Say I have a table with a key structure of : {atom, atom}.  I insert 
> 3 records with the following key values:  {1 , 1} {1, 2} {1, 3}.  I add 
> another 10,00 entries with other key values.  I then add {1, 4} {1, 5}.  
> Now I want to get all records with the first key value of 1.  Assuming I 
> use db_find_key and find and read the first three records, would I have 
> to read through the next 10,000 to get the last two records?  If I 
> checked for the first key value to change and then drop out of my read 
> loop I would think I would miss the last two records for key value of 1. 
>  Is this right?

No.  The records are automatically 'sorted' by key with regards to record
numbers.

Another way to do this would be to use indices with your db.  I have some plans
to do just that with my EuSQL package, which should make [some] queries very
fast.  Right now I'm getting insert/update queries to work with literal values
and parameters.  Probably indices will be the next thing I tackle (don't hold
your breath, though :).

Currently, EuSQL would search through all records looking for keys starting
with '1', although the syntax for doing so would be easy, assuming you've
defined the key as a couple of fields "select * from tablename where key.field1
= 1". (Actually, you can't use '*' in the version currently at RDS, but I
should probably have an update by the end of the week which will allow delete,
insert, update and parameterized queries.)

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

3. Re: Robert...EDS - questions/comments

On 12 Feb 2001, at 12:34, matthewwalkerlewis at YAHOO.COM wrote:

> 
> Another way to do this would be to use indices with your db.  I have some
> plans
> to do just that with my EuSQL package, which should make [some] queries very
> fast. 

I have a problem that can only be solved with brute force, since it involves
comparisons
with 150,000 words, minimum,,, and if that works out, it could be expanded to a 
million. Problem is, 150K comparisons takes 82 seconds on my K2-6-266, and i am 
not convinced buying a faster puter will solve it, simply because that is a
software
based retrieval and compare solution. Even if a 5x faster 1Ghz dedicated puter
were
thrown at the problem, a 10 word sentence would still take 3 minutes to run,
which is
intolerable. I wish i could get my hands on one or more of the mythical Lisp
machines,
where hardware was thrown at the problem.

Has anyone else given this a thought? Has anyone here met one of these machines,
or know about them?

Kat

new topic     » goto parent     » topic index » view message » categorize

4. Re: Robert...EDS - questions/comments

Could I see your compare routine?  Maybe there's a way to sort the data so =
150K~1M compares aren't necessary.  (NB: the sort routine could be run =
when Tiggr recognizes she's the only one in the room.  The learning may be =
a bit slower if there is significant lag between updating the DB and =
sorting it; but then a priority level could be assigned to that task to =
let it be done more often.) =20

Just a thot....

Michael J. Sabal

>>> gertie at PELL.NET 02/12/01 04:06PM >>>
I have a problem that can only be solved with brute force, since it =
involves comparisons=20
with 150,000 words, minimum,,, and if that works out, it could be expanded =
to a=20
million.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Robert...EDS - questions/comments

On 12 Feb 2001, at 13:19, Michael  Sabal wrote:

> Could I see your compare routine?  Maybe there's a way to sort the data so
> 150K~1M
> compares aren't necessary.  

If one assumes every word in the sentence is a typo, then every word must be 
compared to find a tree of possible correct words for the entire sentence.
Anything
else is throwing away info.

>(NB: the sort routine could be run when Tiggr recognizes she's
> the only one in the room.  

Sorta useless to reply to conversation only after everyone leaves, isn't it?

Kat

>The learning may be a bit slower if there is significant lag
> between updating the DB and sorting it; but then a priority level could be
> assigned to
> that task to let it be done more often.)  
> 
> Just a thot....
> 
> Michael J. Sabal
> 
> >>> gertie at PELL.NET 02/12/01 04:06PM >>>
> I have a problem that can only be solved with brute force, since it involves
> comparisons
> with 150,000 words, minimum,,, and if that works out, it could be expanded to
> a million.
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: Robert...EDS - questions/comments

Jonas Temple writes:

> 1. Say I have a table with a key structure of : {atom, atom}.
> I insert 3 records with the following key values:
>  {1 , 1} {1, 2} {1, 3}.
> I add another 10,00 entries with other key values.
> I then add {1, 4} {1, 5}.
> Now I want to get all records with the first key value of 1.
> Assuming I use db_find_key and find and read the
> first three records, would I have to read through the
> next 10,000 to get the last two records?

No. As Matt Lewis pointed out, the records are always organized
in order of key value. That allows a fast binary search to be used
to find any key. Sequences are sorted in the usual "alphabetic"
way, with the first element being the most significant.

> 2. Somewhat relating to question 1, if I used db_find_key
> with a key value of {1, 0} (assuming there will never be
> a record with a second key value of 0) I would get 
> a negative result.

Yes.

> If I am reading the docs right, the negative number
> would be the record number if it were inserted into the file.

Yes.

> For example, if the db_find_key returned -4, would this mean
> that if I were to retrieve record 4, would it have a key
> value of {1, 1} (assuming key value {1, 1} is in the file)?

Yes, the -4 tells you that if {1,0} were inserted right now,
it would be the 4th record. Since you haven't actually inserted
it yet, the current 4th record would be the record that 
comes after {1,0}. {1,1} in your case.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

7. Re: Robert...EDS - questions/comments

I'm going to make a few assumptions about your dictionary format that may =
be way off.  If they are, then probably most of what I say will be =
useless; but here goes anyway.

I assume you have a dictionary that Tiggr can look in to determine the =
meaning of words she reads, and in which she can find appropriate words =
with which to respond.  In order to know the appropriateness of a word, I =
assume you have a class field as part of the dictionary entry.  For =
example, a homonym with both a casual class entry and a technical class =
entry, but different meanings, would have to be decided between based on =
the context of the discussion.

You must have a means for Tiggr to learn new vocabulary based on the =
context.  Why not allow Tiggr to learn typos in the same way she learns =
other vocabulary, but with a class that prevents her from using the typo =
in her own responses?

Also consider that about half the words in a sentence are grammatical.  =
That means that, based on the position in the sentence, grammatical typos =
need only be compared to grammatical words and not the other half-million =
nouns,verbs, adjectives, etc. in the dictionary.  If the word expected is =
a noun, compare the typo to only nouns, etc.  This is a large reason why I =
chose to go with a hex-based language for internal processing, even though =
the overhead of translating to that language would be a bit higher.

As for my statement about sorting during down-times, I was referring only =
to sorting, not to comparing.  The compare would obviously have to happen =
during the conversation.

Michael J. Sabal

>>> gertie at PELL.NET 02/12/01 04:41PM >>>
If one assumes every word in the sentence is a typo, then every word must =
be=20
compared to find a tree of possible correct words for the entire sentence. =
Anything=20
else is throwing away info.

>(NB: the sort routine could be run when Tiggr recognizes she's
> the only one in the room. =20

Sorta useless to reply to conversation only after everyone leaves, isn't =
it?

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu