1. Turning a Text Sequence into a Sequence Array

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

new topic     » topic index » view message » categorize

2. Re: Turning a Text Sequence into a Sequence Array

Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Use split() from std/sequence.e

new topic     » goto parent     » topic index » view message » categorize

3. Re: Turning a Text Sequence into a Sequence Array

Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Or try the strtok lib.

useless

new topic     » goto parent     » topic index » view message » categorize

4. Re: Turning a Text Sequence into a Sequence Array

useless said...
Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Or try the strtok lib.

useless

And the strok lib, "sting token routines", can be found at:

http://www.rapideuphoria.com/strtok-v2-1.zip

Dan M

new topic     » goto parent     » topic index » view message » categorize

5. Re: Turning a Text Sequence into a Sequence Array

DanM said...
useless said...
Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Or try the strtok lib.

useless

And the strok lib, "sting token routines", can be found at:

http://www.rapideuphoria.com/strtok-v2-1.zip

Dan M


v2.2 was the last public release i made.
v3.0 is the version i am using now.

useless

new topic     » goto parent     » topic index » view message » categorize

6. Re: Turning a Text Sequence into a Sequence Array

jimcbrown said...
Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Use split() from std/sequence.e

With Euphoria v4, the standard library module sequence.e contains the split function. It can be used like this ...

include std/console.e 
include std/sequence.e 
sequence text = "This is only theoretically a sentence." 
 
sequence result = split(' ', text) 
 
display (result) 


The output would be ...

{ 
  "This", 
  "is", 
  "only", 
  "theoretically", 
  "a", 
  "sentence." 
} 

new topic     » goto parent     » topic index » view message » categorize

7. Re: Turning a Text Sequence into a Sequence Array

useless said...


v2.2 was the last public release i made.
v3.0 is the version i am using now.

useless

Better for others to use the standard library then, as it sounds like strtok is unmaintained (at least publicly).

new topic     » goto parent     » topic index » view message » categorize

8. Re: Turning a Text Sequence into a Sequence Array

useless said...
DanM said...
useless said...
Kaladan said...

Hi, I'm new here. And new to the language. I did a search for this subject but I was unlucky. I was wondering if anyone could tell me the easiest or quickest way to turn a Text Sequence into a sequence array. In other words I want to turn this:

"This is only theoretically a sentence."

into this:

{"This", "is", "only", "theoretically", "a", "sentence."}

Any help would be much appreciated.

Kaladan

Or try the strtok lib.

useless

And the strok lib, "sting token routines", can be found at:

http://www.rapideuphoria.com/strtok-v2-1.zip

Dan M


v2.2 was the last public release i made.
v3.0 is the version i am using now.

useless

Oh, ok, when I searched the archive, 2.1 was the only one I found there, where's 2.2 or 3.0?

Dan

new topic     » goto parent     » topic index » view message » categorize

9. Re: Turning a Text Sequence into a Sequence Array

DanM said...

Oh, ok, when I searched the archive, 2.1 was the only one I found there, where's 2.2 or 3.0?

Dan


Deleted from the archives.
Since my words have meant nothing here, and my code was repeatedly ridiculed by developers, i don't intend to make any more contributions, i said that years ago, and i meant it.

I see http.e still has a bug, and task msging is still not up to my code level.

useless

new topic     » goto parent     » topic index » view message » categorize

10. Re: Turning a Text Sequence into a Sequence Array

tested, work fine.

sequence res,sentence 
sentence="This is only theoretically a sentence."  
res={} 
while 1=1 do 
 if not find(32,sentence) then 
	 res&={sentence} 
	 exit 
	end if 
	res&={sentence[1..find(32,sentence)]} 
 sentence=sentence[find(32,sentence)+1..length(sentence)] 
end while	 
new topic     » goto parent     » topic index » view message » categorize

11. Re: Turning a Text Sequence into a Sequence Array

sergelli said...

tested, work fine.

sequence res,sentence 
sentence="This is only theoretically a sentence."  
res={} 
while 1=1 do 
 if not find(32,sentence) then 
	 res&={sentence} 
	 exit 
	end if 
	res&={sentence[1..find(32,sentence)]} 
 sentence=sentence[find(32,sentence)+1..length(sentence)] 
end while	 

Almost works ... each word has a trailing space and if the input has multiple spaces between words you get empty words output. And its slower than it needs to be. And it doesn't allow TAB characters in between words.

Try this much faster and accurate one ...

sequence res,sentence 
sentence= "  This\t is only  theoretically a sentence."  
res={} 
integer a 
 
a = 0 -- not in a word yet. 
for b = 1 to length(sentence) do 
    if find(sentence[b], " \t\n") then 
        -- Found some whitespace 
        if a != 0 then 
            -- and it marks the end of a word. 
            res = append(res, sentence[a .. b-1]) 
            a = 0 -- no longer in word 
        end if 
    elsif a = 0 then 
        -- Found start of a word 
        a = b -- remember starting position 
    end if   
end for 
     
if a != 0 then 
    -- One remaining word in the sentence. 
    res = append(res, sentence[a .. $]) 
end if 
new topic     » goto parent     » topic index » view message » categorize

12. Re: Turning a Text Sequence into a Sequence Array

Hey, thanks for all the help and advice. Much appreciated! This however leads to another (set of) question(s)...why isn't "std/sequence.e" found anywhere in my install? Was there something I forgot or neglected to download? I know, I know I sound like such a neophyte, but this is something I've wondered since I read the manual. Any help there?

new topic     » goto parent     » topic index » view message » categorize

13. Re: Turning a Text Sequence into a Sequence Array

Kaladan said...

Hey, thanks for all the help and advice. Much appreciated! This however leads to another (set of) question(s)...why isn't "std/sequence.e" found anywhere in my install? Was there something I forgot or neglected to download? I know, I know I sound like such a neophyte, but this is something I've wondered since I read the manual. Any help there?

What version of Euphoria did you install?

new topic     » goto parent     » topic index » view message » categorize

14. Re: Turning a Text Sequence into a Sequence Array

Sorry so slow to reply. I've got 3.11 installed.

new topic     » goto parent     » topic index » view message » categorize

15. Re: Turning a Text Sequence into a Sequence Array

Version 4.0 (beta, but usable) has the new standard library.

new topic     » goto parent     » topic index » view message » categorize

16. Re: Turning a Text Sequence into a Sequence Array

Okay, thanks! I've always been wary of betas, but I'll give it a go! Thank you very much!

new topic     » goto parent     » topic index » view message » categorize

17. Re: Turning a Text Sequence into a Sequence Array

Kaladan said...

Okay, thanks! I've always been wary of betas, but I'll give it a go! Thank you very much!

And, as far as I can tell, Kat's strtok.e seems to work pretty well with 3.11, here's a simple test, which DOES include an EXTRA SPACE between a couple of words:

include strtok-v2-1.e 
include get.e 
include misc.e 
 
object NULL 
sequence t, r 
t = "this is a test sentence  with one extra space." 
r = parse(t, ' ') 
 
pretty_print(1, r, {3}) 
 
NULL = wait_key() 


And here's the output: { "this", "is", "a", "test", "sentence", "with", "one", "extra", "space." }

Pretty simple, no?

Dan M.

new topic     » goto parent     » topic index » view message » categorize

18. Re: Turning a Text Sequence into a Sequence Array

DanM said...
Kaladan said...

Okay, thanks! I've always been wary of betas, but I'll give it a go! Thank you very much!

I have a version of split() that works as far back as Euphora 2.2, if you prefer to use an older version of Euphoria until 4.0 is out of beta.

-- (c) Copyright - See License.txt 
-- 
 
global function find_from(object delim, sequence st, integer start) 
	integer ret 
	ret = find(delim, st[start..length(st)]) 
	if ret then 
		ret = ret + start - 1 
	end if 
	return ret 
end function 
 
global function match_from(object delim, sequence st, integer start) 
	integer ret 
	ret = match(delim, st[start..length(st)]) 
	if ret then 
		ret = ret + start - 1 
	end if 
	return ret 
end function 
 
global function split_with_limits(object delim, sequence st, integer limit, integer no_empty) 
	sequence ret 
	integer start 
	integer pos 
	integer k 
 
	ret = {} 
 
	if length(st) = 0 then 
		return ret 
	end if 
 
 
	if sequence(delim) then 
		-- Handle the simple case of split("", "123"), opposite is join({"1","2","3"}, "") -- "123" 
		if equal(delim, "") then 
			for i = 1 to length(st) do 
				st[i] = {st[i]} 
				limit -= 1 
				if limit = 0 then 
					st = append(st[1 .. i],st[i+1 .. length(st)]) 
					exit 
				end if 
			end for 
 
			return st 
		end if 
 
		start = 1 
		while start <= length(st) do 
			pos = match_from(delim, st, start) 
 
			if pos = 0 then 
				exit 
			end if 
 
			ret = append(ret, st[start..pos-1]) 
			start = pos+length(delim) 
			limit -= 1 
			if limit = 0 then 
				exit 
			end if 
		end while 
	else 
		start = 1 
		while start <= length(st) do 
			pos = find_from(delim, st, start) 
 
			if pos = 0 then 
				exit 
			end if 
 
			ret = append(ret, st[start..pos-1]) 
			start = pos + 1 
			limit -= 1 
			if limit = 0 then 
				exit 
			end if 
		end while 
	end if 
 
	ret = append(ret, st[start..length(st)]) 
 
	k = length(ret) 
	if no_empty then 
		k = 0 
		for i = 1 to length(ret) do 
			if length(ret[i]) != 0 then 
				k += 1 
				if k != i then 
					ret[k] = ret[i] 
				end if 
			end if 
		end for 
	end if 
 
	if k < length(ret) then 
		return ret[1 .. k] 
	else 
		return ret 
	end if 
end function 
 
global function split(object delim, sequence st) 
	return split_with_limits(delim, st, 0, 0) 
end function 
 
object result 
result = split(" ", "John Middle Doe") 
for i = 1 to length(result) do 
	puts(1, "|"&result[i]&"|\n") 
end for 
new topic     » goto parent     » topic index » view message » categorize

19. Re: Turning a Text Sequence into a Sequence Array

DanM said...
Kaladan said...

Okay, thanks! I've always been wary of betas, but I'll give it a go! Thank you very much!

And, as far as I can tell, Kat's strtok.e seems to work pretty well with 3.11, here's a simple test, which DOES include an EXTRA SPACE between a couple of words:

include strtok-v2-1.e 
include get.e 
include misc.e 
 
object NULL 
sequence t, r 
t = "this is a test sentence  with one extra space." 
r = parse(t, ' ') 
 
pretty_print(1, r, {3}) 
 
NULL = wait_key() 


And here's the output: { "this", "is", "a", "test", "sentence", "with", "one", "extra", "space." }

Pretty simple, no?

Dan M.


Per the readme for v2.1, to get rid of that period (and other chaff) as well:

parse("this,is.a?test here",",.?") = {"this","is","a","test here"}
(note the space is not specified in this example)

Or:
parse("nick!ident@a.net","!@") = {"nick","ident","a.net}

but to KEEP the separators intact as separate tokens ("words", if you prefer):
parse("nick!ident@a.net",{"k","!@"}) = {"nick","!","ident","@","a.net"}

useless

new topic     » goto parent     » topic index » view message » categorize

20. Re: Turning a Text Sequence into a Sequence Array

Thanks again, one and all. I got with the program and installed version 4, got the libraries I was looking for. Going good so far. I'm in the planning stages of building an advanced AI chatbot, but I needed to separate the words into an array for very complex word/sentence/context recognition procedures. Not to sound like a commercial but Euphoria has been the best language I've found to try to write the thing in. Glad I ran across it! If I run into any more difficulties, this is where I know to be!

new topic     » goto parent     » topic index » view message » categorize

21. Re: Turning a Text Sequence into a Sequence Array

Hi, IIRC Kat (aka useless) had a bot called Tiggr that was mostly Euphoria based. YMMV ;)

new topic     » goto parent     » topic index » view message » categorize

22. Re: Turning a Text Sequence into a Sequence Array

alanjohnoxley said...

Hi, IIRC Kat (aka useless) had a bot called Tiggr that was mostly Euphoria based. YMMV ;)


I yanked all the eu code a couple yrs ago. No one wants an intelligent bot of any sort. There was no one programming in eu that was interested either.

useless

new topic     » goto parent     » topic index » view message » categorize

23. Re: Turning a Text Sequence into a Sequence Array - strtok-v2-1.e

Hallo! The ability to automate the sequencing of words in English sentences and change that sequence, to reflect the requirements of German grammar, is what I begin with in a translation framework to translate my English texts into German, is just what I need. When I have tested the script I will come back with an answer as to what I think about the script. I am sure that it will meet my needs. Regards and thanks, patforkin.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu