1. Str-Kat

Kat,

A EUPHORIA string is a sequence that contains integer
values that each represent a character value.  EUPHORIA
has a string type as much as C does.

A string is not dealt seperately as a special type and
it doesn't need to be.

Unlike both BASIC and C though there are no special
string ops that concatonate, determine the length, and
copy.  In EUPHORIA you manipulate strings and arrays 
the same way because they are the same.  They are all 
sequences. And since sequence manipulation is strait 
forward so is string manipulation. I wouldn't want 
it to be any other way.

Shawn Pringle

new topic     » topic index » view message » categorize

2. Re: Str-Kat

Shawn Pringle wrote:
> 
> Kat,
> 
> A EUPHORIA string is a sequence that contains integer
> values that each represent a character value.  EUPHORIA
> has a string type as much as C does.
> 
> A string is not dealt seperately as a special type and
> it doesn't need to be.
> 
> Unlike both BASIC and C though there are no special
> string ops that concatonate, determine the length, and
> copy.  In EUPHORIA you manipulate strings and arrays 
> the same way because they are the same.  They are all 
> sequences. And since sequence manipulation is strait 
> forward so is string manipulation. I wouldn't want 
> it to be any other way.

Really? Since when does EUPHORIA use only 8 bits for each CHAR in the SEQUENCE ?
Since when can you load a 500mbyte STRING into EUPHORIA and not have the OS kill
the application with "too much memory used" error (windoze allows each app to
have  only 2 gigabytes)?

Kat,
forgetting she wrote STRING-TOKENS lib in archives.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Str-Kat

Kat wrote:
> 
> Shawn Pringle wrote:
> > 
> > Kat,
> > 
> > A EUPHORIA string is a sequence that contains integer
> > values that each represent a character value.  EUPHORIA
> > has a string type as much as C does.
> > 
> > A string is not dealt seperately as a special type and
> > it doesn't need to be.
> > 
> > Unlike both BASIC and C though there are no special
> > string ops that concatonate, determine the length, and
> > copy.  In EUPHORIA you manipulate strings and arrays 
> > the same way because they are the same.  They are all 
> > sequences. And since sequence manipulation is strait 
> > forward so is string manipulation. I wouldn't want 
> > it to be any other way.
> 
> Really? Since when does EUPHORIA use only 8 bits for each CHAR in the SEQUENCE
> ? Since when can you load a 500mbyte STRING into EUPHORIA and not have the OS
> kill the application with "too much memory used" error (windoze allows each
> app to have  only 2 gigabytes)?
> 
> Kat,
> forgetting she wrote STRING-TOKENS lib in archives.

I didn't say 8 bits for each character.  Normally they are 7-bit but for
those who are doing non-English they could be 18 bit.  Frankly, I don't
give a damn.

Why would you want to load a 500 MB string into memory at once anyway?


Shawn

new topic     » goto parent     » topic index » view message » categorize

4. Re: Str-Kat

Kat wrote:
> 
> Shawn Pringle wrote:
> > 
> > Kat,
> > 
> > A EUPHORIA string is a sequence that contains integer
> > values that each represent a character value.


> Really? 

Ok, its not quite accurate. It should read more like ...

"A EUPHORIA string is a sequence that ONLY contains POSTIVE integer
values that each represent a character value."

> Since when does EUPHORIA use only 8 bits for each CHAR
> in the SEQUENCE? 

Since when is the definition of "string" :: An array of 8-bit unsigned integers?

> Since when can you load a 500mbyte STRING into EUPHORIA and
> not have the OS kill the application with "too much memory used"
> error (windoze allows each app to have  only 2 gigabytes)?

It doesn't. I bet a Commodore 64 couldn't do that either.

Since when do you absolutely, positively, must have all those 500 mega BYTES in
RAM at the same time? Are you saying that your task can only be achieved if all
those bytes are in RAM simultaneously?

-- 
Derek Parnell
Melbourne, Australia
Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

5. Re: Str-Kat

Shawn Pringle wrote:
> 
> Kat wrote:
> > 
> > Shawn Pringle wrote:
> > > 
> > > Kat,
> > > 
> > > A EUPHORIA string is a sequence that contains integer
> > > values that each represent a character value.  EUPHORIA
> > > has a string type as much as C does.
> > > 
> > > A string is not dealt seperately as a special type and
> > > it doesn't need to be.
> > > 
> > > Unlike both BASIC and C though there are no special
> > > string ops that concatonate, determine the length, and
> > > copy.  In EUPHORIA you manipulate strings and arrays 
> > > the same way because they are the same.  They are all 
> > > sequences. And since sequence manipulation is strait 
> > > forward so is string manipulation. I wouldn't want 
> > > it to be any other way.
> > 
> > Really? Since when does EUPHORIA use only 8 bits for each CHAR in the
> > SEQUENCE
> > ? Since when can you load a 500mbyte STRING into EUPHORIA and not have the
> > OS
> > kill the application with "too much memory used" error (windoze allows each
> > app to have  only 2 gigabytes)?
> > 
> > Kat,
> > forgetting she wrote STRING-TOKENS lib in archives.
> 
> I didn't say 8 bits for each character.  Normally they are 7-bit but for
> those who are doing non-English they could be 18 bit.  Frankly, I don't
> give a damn.

There's the problem. Doesn't explain why you brought it up tho.
 
> Why would you want to load a 500 MB string into memory at once anyway?

You going to criticise the "why", instead of finding the "how"?

Ok, try loading 250 megabytes and *using* it. The 250 will become a gigabyte,
and almost any way you use it will copy it, making it 2 gigabytes, and the OS
will kill it.

Oh, for non-ascii chars in non-usa places, you could still have UTF-8 or UTF-16
strings and still save memory. Your example still used 32bits/char.

Kat

new topic     » goto parent     » topic index » view message » categorize

6. Re: Str-Kat

Shawn Pringle wrote:
> 
> Kat,
> 
> A EUPHORIA string is a sequence that contains integer
> values that each represent a character value.  EUPHORIA
> has a string type as much as C does.
> 
> A string is not dealt seperately as a special type and
> it doesn't need to be.
> 
> Unlike both BASIC and C though there are no special
> string ops that concatonate, determine the length, and
> copy.  In EUPHORIA you manipulate strings and arrays 
> the same way because they are the same.  They are all 
> sequences. And since sequence manipulation is strait 
> forward so is string manipulation. I wouldn't want 
> it to be any other way.
> 
> Shawn Pringle

Compare the memory overhead and performance hit that general sequences take when
compared to raw arrays of bytes/words/dwords in memory, and you will want to have
different operators and types.
The flexibility of sequences is wonderful. However, sequences of bytes/dwords
are a fairly common special case, and there is room for the processing speed and
memory footprint to be much, much more optimised.

CChris

new topic     » goto parent     » topic index » view message » categorize

7. Re: Str-Kat

Derek Parnell wrote:
> 
> Kat wrote:
> > 
> > Shawn Pringle wrote:
> > > 
> > > Kat,
> > > 
> > > A EUPHORIA string is a sequence that contains integer
> > > values that each represent a character value.
> 
> 
> > Really? 
> 
> Ok, its not quite accurate. It should read more like ...
> 
> "A EUPHORIA string is a sequence that ONLY contains POSTIVE integer
> values that each represent a character value."
> 
> > Since when does EUPHORIA use only 8 bits for each CHAR
> > in the SEQUENCE? 
> 
> Since when is the definition of "string" :: An array of 8-bit unsigned
> integers?
>

I just do not accept the COBOL definition of a string.  Eventually we all will
be using Unicode in one form or another.  Sure, utf-8 would fit in that
definition
but one could also use 16-bit unsigned integers because the Windows API either
does that or returns chars using an unportable codepage.  To, me there are only
two types of strings I work with: 16-bit and 7-bit ASCII.  

Inspite of having 
over 65,000 characters to work with the ANSI Unicode commitee wasn't able to 
fit everything in that space.  I have to blame their waste on things like
special codes for Roman numerals(!) and other needless homoglyphs.  So, they
have scheme for encoding an 18-bit space in multiple 16-bit characters.  Don't
worry about EUPHORIA strings, the UNICODE commitee will eventually use
four billion different code points anyway.  Just wait a few years for them to
catch up. ;)

Shawn




> Derek Parnell
> Melbourne, Australia
> Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

8. Re: Str-Kat

I've been reading this string thread and I'm quite astonished because of two
simple facts.

Strings are just data and we are supposed to be programmers.

There are different types of strings of course, like BSTRings, unicode strings
and every C programmers favorite, null terminated strings.  All of these
strings are referenced by address via pointer.  The last fact is we already
have a euphoria word that produces a string...

allocate_string.  Perhapa this needs to be promoted into the core, but we
have it already.  I reiterate, we are programmers.  We are free to manipulate
such strings in any way a programmer can, which is to say any way at all.
Limited only by the programmers imagination.  So where is the problem?

Unicode, same deal.  There may be a lot of things that are difficult for
programmers to do mainly because of undocumented interfaces, but manipulation
of data should not be one of them.

Are we not all programmers?  Stand up and be counted.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Str-Kat

My two cents on the subject:
Since it seems that something will be done regarding strings,
a solution I envision is to add a field to the internal sequence descriptor
telling how many bits has an atom for this particular sequence.
The programmer will in principle be unaware of the atom length, and EU will
automatically take care of this attribute.
For example, assume an empty sequence is being filled with chars via &= or
append(). This sequence will contain only 8-bits atoms. If an Unicode char
is added, then all the previous elements will be transformed to 16-bit format.
If an integer is added (having a negative or high value), then all the previous
elements will be upgraded to the corresponding size.
This will even allow for 1-bit elements, or to have integers up to 128
using a single byte, so increasing both space and time efficiency.
After all, this is already done when you append a fraction to a sequence
that is only composed by integers.
Regards.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Str-Kat

ken mortenson wrote:

<snip>
> 
> Are we not all programmers?  Stand up and be counted.

Are you not yet entertained?

(sorry, wine and gladiatorial reference sprang to mind)

Chris

new topic     » goto parent     » topic index » view message » categorize

11. Re: Str-Kat

ken mortenson wrote:
> 
> I've been reading this string thread and I'm quite astonished because of two
> simple facts.
> 
> Strings are just data and we are supposed to be programmers.
> 
> There are different types of strings of course, like BSTRings, unicode strings
> and every C programmers favorite, null terminated strings.  All of these
> strings are referenced by address via pointer.  The last fact is we already
> have a euphoria word that produces a string...
> 
> allocate_string.  

Yeasbut, all the sequence operators will need to be recoded for the string type
of your choosing. How exactly the string is implemented then becomes irrelavant
(or irreverant for the language purists). Namespacing will make this easier, i
suspect:

compare(seq,seq)
strings:compare(string,string)

but it still must be coded up. It would also be easier if Eu returned unused
memeory to the OS via some command or other, but an outboard strings.euu (a la
.tpu) managing it's memory thru api calls, wluod mkae it wrok too, in a very
non-euphorian way. I had considered making this sorta thing, but like working on
eunet, eubot, etc, i just couldn't do it with humans in the way objecting to it,
i gave up.

Kat

new topic     » goto parent     » topic index » view message » categorize

12. Re: Str-Kat

Kat wrote:
> 
> It would also be easier if Eu returned unused
> memory to the OS via some command or other, but an outboard strings.euu (a
> la .tpu) managing it's memory thru api calls, wluod mkae it wrok too, in a
> very
> non-euphorian way. I had considered making this sorta thing, but like working
> on eunet, eubot, etc, i just couldn't do it with humans in the way objecting
> to it, i gave up. 
> 
> Kat

That is a clever work around.  Load a euphoria program that shares memory
space for the explicit reason of being able to close out and reload said
program to manage the memory.

    Lucius L. Hilley III - Unkmar

new topic     » goto parent     » topic index » view message » categorize

13. Re: Str-Kat

Kat wrote:

> It would also be easier if Eu returned unused
> memeory to the OS via some command or other

I believe it does Kat.  I believe I remember seeing free() routine?

new topic     » goto parent     » topic index » view message » categorize

14. Re: Str-Kat

ken mortenson wrote:
> 
> Kat wrote:
> 
> > It would also be easier if Eu returned unused
> > memeory to the OS via some command or other
> 
> I believe it does Kat.  I believe I remember seeing free() routine?

So it's only a matter of using allocate() and free()? No wonder it's not been
done yet! Where is the magic spell that makes compare() and equal() and string[2]
and length(string) work, after you get allocate() and free() typed out?

Kat

new topic     » goto parent     » topic index » view message » categorize

15. Re: Str-Kat

ken mortenson wrote:
<snip>

Ken, you gotta realise, there's been substancial resistance to actual strings in
Eu. Sequences are great things, wonderful things. But most of the world is
strings, like this sentence. Or this paragraph. And the world is a big place,
there's a looooot of strings out there. And most people would rather fight having
strings in Eu in any form, especially if i write about it, or write the code.
Just ask CK or JBrown. And i just can't justify anything i say, well enough to
get anything to change. If i had a code block that did all i believe should be
done in Eu, i'd be ashamed to say so.

Kat

new topic     » goto parent     » topic index » view message » categorize

16. Re: Str-Kat

Kat wrote:
> 
> ken mortenson wrote:
> > 
> > Kat wrote:
> > 
> > > It would also be easier if Eu returned unused
> > > memeory to the OS via some command or other
> > 
> > I believe it does Kat.  I believe I remember seeing free() routine?
> 
> So it's only a matter of using allocate() and free()? No wonder it's not been
> done yet! Where is the magic spell that makes compare() and equal() and
> string[2]
> and length(string) work, after you get allocate() and free() typed out?
> 
> Kat

--string.e
---------------------
namespace string

global function compare(atom a, atom b)
integer i, ac, bc
i = 0
ac = peek(a)
bc = peek(b)
while 1 do
if ac > bc then
return 1
elsif ac < bc then
return -1
elsif (ac = bc) and (ac = 0) then
return 0
else
i = i + 1
ac = peek(a+i)
bc = peek(b+i)
end if
end while
end function

global function equal(atom a, atom b)
return compare(a,b) = 0
end function

global function length(atom a)
integer i
i = 0
while peek(a+i) != 0 do
i = i + 1
end while
return i
end function

--string[2] is impossible to do currently, but there is this workaround
global function slice(atom a, integer i)
return peek(a+i)
end function

new topic     » goto parent     » topic index » view message » categorize

17. Re: Str-Kat

Kat wrote:
> 
> And most people would rather fight having
> strings in Eu in any form, especially if i write about it, or write the code.
> Just ask CK or JBrown.

I've never fought against string.

Why do I keep getting caught up in your insanity? Please leave already.

new topic     » goto parent     » topic index » view message » categorize

18. Re: Str-Kat

Kat wrote:
> So it's only a matter of using allocate() and free()? No wonder it's not been
> done yet! Where is the magic spell that makes compare() and equal() and
> string[2]
> and length(string) work, after you get allocate() and free() typed out?

Kat, please don't be snarky.  You could write that stuff couldn't you?  I
would agree that classes would make it simpler, but again it's just data
and Euphoria gives you all the tools you need.  Could it more effeciently
be written into the core?  Absolutely.  There is where perhaps you could
make your case.

But your case would be a lot stronger if you had written a string library
and already found it's performance limited.  I haven't checked the archive
but if someone there has written a string library you might find an ally
to adding it to the core?  Something to think about anyway.

I hope I've given you some helpful ideas.

new topic     » goto parent     » topic index » view message » categorize

19. Re: Str-Kat

c.k.lester wrote:
> 
> Kat wrote:
> > 
> > And most people would rather fight having
> > strings in Eu in any form, especially if i write about it, or write the
> > code.
> > Just ask CK or JBrown.
> 
> I've never fought against string.

"especially if i write about it, or write the code."

> Why do I keep getting caught up in your insanity? Please leave already.

Proves my point.

Kat

new topic     » goto parent     » topic index » view message » categorize

20. Re: Str-Kat

Kat wrote:

> Ken, you gotta realise, there's been substancial resistance to actual strings
> in Eu. Sequences are great things, wonderful things. But most of the world is
> strings, like this sentence. Or this paragraph. And the world is a big place,
> there's a looooot of strings out there. And most people would rather fight
> having
> strings in Eu in any form, especially if i write about it, or write the code.
> Just ask CK or JBrown. And i just can't justify anything i say, well enough
> to get anything to change. If i had a code block that did all i believe should
> be done in Eu, i'd be ashamed to say so.

I understand your point Kat.  These guys must be some kind of minimalists or
something, eh? blink

What you need are allies, Kat.  Instead of beating heads and walls (one of my
favorite pastimes btw) find out if others have the same need you do.

It would make a lot stronger case if you had others to champion the cause.
It is unfortunate, but people do resist ideas for lot's of human reasons
which have little to do with the ideas themselves.

Personally, I don't see a compelling case for strings in Euphoria because
I really like how sequences have been implemented (particularly with regard
to allocation/deallocation, no garbage collection or memory leaks.)  They
provide a way to send and receive strings from foreign DLLs.

Now if you add strings, you open up a whole lot of potential memory issues
that Euphoria thankfully doesn't have.   That isn't a show stopper in my
mind.  I think strings can be handled well.  C has the cleanest strings 
(just a pointer and a null terminator) but it makes the programmer do all
the management tasks.  VB has a slightly more complicated BSTRing type and
does the management for you, but it's a really slow implementation.

I wish you well Kat.  Any champians out there?  Anyone know how to spell
champian?

new topic     » goto parent     » topic index » view message » categorize

21. Re: Str-Kat

Kat wrote:
> 
> ken mortenson wrote:
> <snip>
> 
> Ken, you gotta realise, there's been substancial resistance to actual strings
> in Eu. Sequences are great things, wonderful things. But most of the world is
> strings, like this sentence. Or this paragraph. And the world is a big place,
> there's a looooot of strings out there. 

Yes, and euphoria does a pretty good job with most string jobs I've run
across.  We all agree that sequences with millions of elements don't 
work well with today's hardware, which happens to be your use case.

> And most people would rather fight having strings in Eu in any form,
> especially if i write about it, or write the code. Just ask CK or JBrown.

Yeah, some people disagree with you.  Some don't.  But it doesn't matter what 
anyone wants if no one can figure out a good way to implement them, which 
happens to be the case with strings.

> And i just can't justify anything i say, well enough to get anything to 
> change. If i had a code block that did all i believe should be done in Eu,
> i'd be ashamed to say so.

Few people have.  It *does* help if you have some code that does it.  Asking
others to do the work on stuff you're interested in works less well in a 
volunteer setting.

The only code of yours I've seen really has been strtok stuff, and it seemed
good enough to me.  I haven't really seen the assaults on your code, but
I've sure heard you talk about them a lot.  I certainly sympathize with
your RL issues.

Matt

new topic     » goto parent     » topic index » view message » categorize

22. Re: Str-Kat

ken mortenson wrote:

<snip>

> But your case would be a lot stronger if you had written a string library
> and already found it's performance limited.  

Did, done, found it so. Try winxp on a computer with 512megs memory, and load
and use strtok's parse on a 100kbyte string. 100K byte isn't hard to find, some
webpages are over 100Kbytes, NOT counting the css, js, and pics. You'll get
bogged down with drive swapping the memory back and forth.

Kat

new topic     » goto parent     » topic index » view message » categorize

23. Re: Str-Kat

ken mortenson wrote:
> 
> Kat wrote:
> 
> > Ken, you gotta realise, there's been substancial resistance to actual
> > strings
> > in Eu. Sequences are great things, wonderful things. But most of the world
> > is
> > strings, like this sentence. Or this paragraph. And the world is a big
> > place,
> > there's a looooot of strings out there. And most people would rather fight
> > having
> > strings in Eu in any form, especially if i write about it, or write the
> > code.
> > Just ask CK or JBrown. And i just can't justify anything i say, well enough
> > to get anything to change. If i had a code block that did all i believe
> > should
> > be done in Eu, i'd be ashamed to say so.
> 
> I understand your point Kat.  These guys must be some kind of minimalists or
> something, eh? blink
> 
> What you need are allies, Kat.  Instead of beating heads and walls (one of my
> favorite pastimes btw) find out if others have the same need you do.
> 
> It would make a lot stronger case if you had others to champion the cause.
> It is unfortunate, but people do resist ideas for lot's of human reasons
> which have little to do with the ideas themselves.
> 
> Personally, I don't see a compelling case for strings in Euphoria because
> I really like how sequences have been implemented (particularly with regard
> to allocation/deallocation, no garbage collection or memory leaks.)  They
> provide a way to send and receive strings from foreign DLLs.
> 
> Now if you add strings, you open up a whole lot of potential memory issues
> that Euphoria thankfully doesn't have.   That isn't a show stopper in my
> mind.  I think strings can be handled well.  C has the cleanest strings 
> (just a pointer and a null terminator) but it makes the programmer do all
> the management tasks.  VB has a slightly more complicated BSTRing type and
> does the management for you, but it's a really slow implementation.
> 
> I wish you well Kat.  Any champians out there?  Anyone know how to spell
> champian?

You have spelled champion before.

I can write the code as an include, and it might be better left at that. I used
a lot of pointers to strings in TurboPascal (making PowerBasic catch my eye) and
lots of pchars, so doing the same in Eu would be fairly easy. Btw, people rallied
against pointers too. Pointers seem real un-Eu-like, apparently, and i wouldn't
release code that has them, i been flamed for my code enough already, even
recently.

I even left #Euphoria for CK's pleasure. He still isn't satisfied, as you can
see.

Kat

new topic     » goto parent     » topic index » view message » categorize

24. Re: Str-Kat

Kat wrote:
> 
> ken mortenson wrote:
> > But your case would be a lot stronger if you had written a string library
> > and already found it's performance limited.  
> 
> Did, done, found it so. Try winxp on a computer with 512megs memory, and load
> and use strtok's parse on a 100kbyte string. 100K byte isn't hard to find,
> some
> webpages are over 100Kbytes, NOT counting the css, js, and pics. You'll get
> bogged down with drive swapping the memory back and forth.

I put together computer from the junk I've got around the house that only
had 32mb or RAM.  I had to search all over the internet for utils and a
browser that would perform on such a limited machine (The power switch died
on it and I haven't replaced it, so I took another machine out of storage
which is a bit better.)

Anyway, you're probably able better than some of the younger folk (assuming
there are younger folk here, I really have no idea of the age demographic)
to remember when we had to tape sorts?  I'm talking millions of records
on 9 track tape.  We made it work.

Your always going to find applications where this isn't enough memory.
Having four times the memory (or having Euphoria use real strings instead
of sequences) isn't going to change that much.

I would take a careful look at what you're doing with the data and try
to manipulate it in a way that doesn't fill memory so much which is putting
you in a disk churning situation because of memory swaps.

I usually only deal with a subset of my data.  If I had to fill memory
with an application I'd probably have to get a machine that allowed me
to add enough memory to do the job (I can't afford my dream machine, but
that's what others have done.)

When I say subset of data, that doesn't mean I'm not processing all of it.
It just means I try do deal with it in managable chunks.

It sounds to me like you really don't have a string issue.  The issue seems
to be more about what algoritms your choosing.  If you are a bit more
detailed in your description, perhaps someone will have some ideas.

Best to ya.

new topic     » goto parent     » topic index » view message » categorize

25. Re: Str-Kat

Kat wrote:

> I can write the code as an include, and it might be better left at that. I
> used
> a lot of pointers to strings in TurboPascal (making PowerBasic catch my eye)
> and lots of pchars, so doing the same in Eu would be fairly easy. Btw, people
> rallied against pointers too. Pointers seem real un-Eu-like, apparently, and
> i wouldn't release code that has them, i been flamed for my code enough
> already,
> even recently.
> 
> I even left #Euphoria for CK's pleasure. He still isn't satisfied, as you can
> see.

The asbestos underwear does come in handy at times.

I never really understood why pointers are so bad, they're just an address
in memory.  You can get carried away with pointers to pointers to pointers
and so forth and I never did like asterisk as a choice of symbol as C
uses.  I always thought @ said address to me, but it's not so much better
either.

PowerBasic ever adds a few more wrinkles because they have more than
pass by value and pass by address (I can't think of what it is right now
but I did find it interesting.)

In my O.T.L. (trademark and EEEVIL patent pending) everything is a function
where the return value can be ignored (doesn't have to be assigned to a
junk variable) Every parameter is passed by address and subs passes back an
address.  As with Euphoria, this allows passing back multiple values.

Which is funny when I think about it cuz I'm really more of a pass by value
kind of guy.  Go figure!

new topic     » goto parent     » topic index » view message » categorize

26. Re: Str-Kat

ken mortenson wrote:
> 
> Kat wrote:
> > 
> > ken mortenson wrote:
> > > But your case would be a lot stronger if you had written a string library
> > > and already found it's performance limited.  
> > 
> > Did, done, found it so. Try winxp on a computer with 512megs memory, and
> > load
> > and use strtok's parse on a 100kbyte string. 100K byte isn't hard to find,
> > some
> > webpages are over 100Kbytes, NOT counting the css, js, and pics. You'll get
> > bogged down with drive swapping the memory back and forth.
> 
> I put together computer from the junk I've got around the house that only
> had 32mb or RAM.  I had to search all over the internet for utils and a
> browser that would perform on such a limited machine (The power switch died
> on it and I haven't replaced it, so I took another machine out of storage
> which is a bit better.)
> 
> Anyway, you're probably able better than some of the younger folk (assuming
> there are younger folk here, I really have no idea of the age demographic)
> to remember when we had to tape sorts?  I'm talking millions of records
> on 9 track tape.  We made it work.

Yes, my first NLP program involved 2 5inch floppy drives and 2 to 3 hours of
manually swapping disks as the computer requested each disc. I had just escaped
using 8 inch floppy drives, still have the drives tho.
 
> Your always going to find applications where this isn't enough memory.
> Having four times the memory (or having Euphoria use real strings instead
> of sequences) isn't going to change that much.
> 
> I would take a careful look at what you're doing with the data and try
> to manipulate it in a way that doesn't fill memory so much which is putting
> you in a disk churning situation because of memory swaps.

Been there, 20K was a huge amount of ram to have. I once paid $100 for a single
byte of solid state static ram in the late 60's. It's how i know how to manage to
get Eu to load a 37meg file so it can be match()'d thru several times per second,
something i couldn't do with the file out onthe drive. But please, if it makes
you happy, continue to question my experience rather than the size of the
problem.
 
> I usually only deal with a subset of my data.  If I had to fill memory
> with an application I'd probably have to get a machine that allowed me
> to add enough memory to do the job (I can't afford my dream machine, but
> that's what others have done.)
> 
> When I say subset of data, that doesn't mean I'm not processing all of it.
> It just means I try do deal with it in managable chunks.

Like one short line at a time out of a 100k text file? Is that what you are
seriously telling me i should do on a modern computer running Euphoria??
 
> It sounds to me like you really don't have a string issue.  The issue seems
> to be more about what algoritms your choosing.  If you are a bit more
> detailed in your description, perhaps someone will have some ideas.

So parsing a single webpage's html is unreasonable, and i should write it to
disk, then handle the file only one line at a time and writing it back to the
drive? What's the difference there vs letting the OS do it? Why is munging even a
100K byte file too excessive??
 
> Best to ya.

You too.

Kat

new topic     » goto parent     » topic index » view message » categorize

27. Re: Str-Kat

Kat wrote:
> 
> I even left #Euphoria for CK's pleasure. He still isn't satisfied,
> as you can see.

My initial comment in the channel was light-hearted, given our relationship
going so far back. I tried to pad it with smilies or misspellings enough to
indicate that I wasn't being totally serious... We've both been Euphoria
programmers for some time and have shared good conversation in the past.
I might even have the logs to prove it. No doubt you do.

I've always sympathized with your plight(s), even when the stories become
somewhat unbelievable. But then something clicks in your brain and all of
a sudden you're this paranoid delusional freak that invites everybody to
her pity party and then lashes out at friend and foe. Others know what I'm
talking about because it's quite abrupt sometimes... and sad.

Kat, I've never bad-mouthed your code, and I've never seen or heard anybody
else say anything bad about your code. I suspect that nobody really has and
that you've taken criticism out of context or you're so lacking in self-
confidence that you consider all negative words to be direct attacks against
poor little you. As far as I'm concerned, you have good ideas and are very
skilled at getting computers to do what you want. I used strtok for a long
time until improvements were made and certain funcs and procs were sped up.
At that time, I think you were on sabbatical from Euphoria. It was a peaceful
time. :)

So, I don't care if you stick around or not. My wish is that you would leave
the delusional paranoia behind, the victim-mentality, or whatever psychosis/
neurosis is driving you these days, and become a mature member of this
Euphoria community, who understands that not everybody is going to see things
the same way, and that some ideas just won't be implemented. Heck, I want
a few things still (see the requested features list) that I don't think will
be implemented unless I do them myself... and right now, until that lottery
ticket hits, I don't have the time or skills to touch the interpreter.
However, I shouldn't say never because there's grumbling about adding
GOTO to Euphoria. Will you live to see it?!!? I pray you're not rotting
in jail subsisting on bread, water, and daily beatings when that day comes.

new topic     » goto parent     » topic index » view message » categorize

28. Re: Str-Kat

Kat wrote:

> Like one short line at a time out of a 100k text file? Is that what you are
> seriously telling me i should do on a modern computer running Euphoria??

If it works.

> So parsing a single webpage's html is unreasonable

I don't know your application.  I can't say one way or the other.

But I can say, there are cats and there are ways to skin 'em.

When you come up with a solution, I'd be happy to hear how you did it.

new topic     » goto parent     » topic index » view message » categorize

29. Re: Str-Kat

ken mortenson wrote:
> 
> ...there are cats and there are ways to skin 'em.

Oooooh! Never say that to a Kat!!!

new topic     » goto parent     » topic index » view message » categorize

30. Re: Str-Kat

Wow, deja vu. CK labeled me with more psychobabble (again), and my responce is
tied up or deleted in moderation (again).

Kat

new topic     » goto parent     » topic index » view message » categorize

31. Re: Str-Kat

Jim Brown wrote:
> 
> Kat wrote:
> > 
> > ken mortenson wrote:
> > > 
> > > Kat wrote:
> > > 
> > > > It would also be easier if Eu returned unused
> > > > memeory to the OS via some command or other
> > > 
> > > I believe it does Kat.  I believe I remember seeing free() routine?
> > 
> > So it's only a matter of using allocate() and free()? No wonder it's not
> > been
> > done yet! Where is the magic spell that makes compare() and equal() and
> > string[2]
> > and length(string) work, after you get allocate() and free() typed out?
> > 
> > Kat
> 
> --string.e
> ---------------------
> namespace string
> 
> global function compare(atom a, atom b)
> integer i, ac, bc
> i = 0
> ac = peek(a)
> bc = peek(b)
> while 1 do
> if ac > bc then
> return 1
> elsif ac < bc then
> return -1
> elsif (ac = bc) and (ac = 0) then
> return 0
> else
> i = i + 1
> ac = peek(a+i)
> bc = peek(b+i)
> end if
> end while
> end function
> 
> global function equal(atom a, atom b)
> return compare(a,b) = 0
> end function
> 
> global function length(atom a)
> integer i
> i = 0
> while peek(a+i) != 0 do
> i = i + 1
> end while
> return i
> end function
> 
> --string[2] is impossible to do currently, but there is this workaround
> global function slice(atom a, integer i)
> return peek(a+i)
> end function

On an Intel CPU, the length function needs only be this:
push edi
push ecx
xor eax,eax
xor ecx,ecx
mov edi,[esp+4]
cld
repnz scasb
jecxz ret
sub eax,ecx
pop ecx
pop edi
ret

A whopping 20 bytes. You can shave 2 more if ecx is discardable. The jecxz is
optional too, becuse machines with 4Go RAM are still hard to find. That comes to
16 bytes, which nicely fits into a cache line.

Oh, and the string address needs not to be on the stack if ecx is discardable.
Can still shave some cycles.

How much slower would the string: code above be? I'd bet between 10 and 30
times.

CChris

new topic     » goto parent     » topic index » view message » categorize

32. Re: Str-Kat

CChris wrote:
> 
> Jim Brown wrote:
> > 
> > Kat wrote:
> > > 
> > > ken mortenson wrote:
> > > > 
> > > > Kat wrote:
> > > > 
> > > > > It would also be easier if Eu returned unused
> > > > > memeory to the OS via some command or other
> > > > 
> > > > I believe it does Kat.  I believe I remember seeing free() routine?
> > > 
> > > So it's only a matter of using allocate() and free()? No wonder it's not
> > > been
> > > done yet! Where is the magic spell that makes compare() and equal() and
> > > string[2]
> > > and length(string) work, after you get allocate() and free() typed out?
> > > 
> > > Kat
> > 
> > --string.e
> > ---------------------
> > namespace string
> > 
> > global function compare(atom a, atom b)
> > integer i, ac, bc
> > i = 0
> > ac = peek(a)
> > bc = peek(b)
> > while 1 do
> > if ac > bc then
> > return 1
> > elsif ac < bc then
> > return -1
> > elsif (ac = bc) and (ac = 0) then
> > return 0
> > else
> > i = i + 1
> > ac = peek(a+i)
> > bc = peek(b+i)
> > end if
> > end while
> > end function
> > 
> > global function equal(atom a, atom b)
> > return compare(a,b) = 0
> > end function
> > 
> > global function length(atom a)
> > integer i
> > i = 0
> > while peek(a+i) != 0 do
> > i = i + 1
> > end while
> > return i
> > end function
> > 
> > --string[2] is impossible to do currently, but there is this workaround
> > global function slice(atom a, integer i)
> > return peek(a+i)
> > end function
> 
> On an Intel CPU, the length function needs only be this:
> push edi
> push ecx
> xor eax,eax
> xor ecx,ecx
> mov edi,[esp+4]
> cld
> repnz scasb
> jecxz ret
> sub eax,ecx
> pop ecx
> pop edi
> ret
> 
> A whopping 20 bytes. You can shave 2 more if ecx is discardable. The jecxz is
> optional too, becuse machines with 4Go RAM are still hard to find. That comes
> to 16 bytes, which nicely fits into a cache line.
> 
> Oh, and the string address needs not to be on the stack if ecx is discardable.
> Can still shave some cycles.
> 
> How much slower would the string: code above be? I'd bet between 10 and 30
> times.
> 
> CChris

I was going for simplicity, not speed.

I would have just done define_c_func(open_dll(""), "strlen", ....) and let your
OS vendor take care of the speed hack.

Both of them would be a lot faster than a pure eu length function. In fact, you
can define_c_func() strcmp for compare(), and also get a speed boost for
compare() and equal().

But Kat doesn't do C (and who can blame her?)

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu