1. Optimizing multiple searches
- Posted by rneu Jun 28, 2017
- 1366 views
CODE:
sequence lines = {"this is first line", "second line", "another line", "and this is another line- will be fourth", "and last line"} sequence terms = {"line", "will", "this"} sequence outlist = {} for i=1 to length(terms) do for j=1 to length(lines) do if match(terms[i], lines[j]) then outlist = append(outlist, lines[j]) end if end for lines = outlist outlist = {} end for display(lines)
Output:
{ "and this is another line- will be fourth" }
I have 2 lists (lines and terms) and I have to select out only those entries of 'lines' where all items of 'terms' are present. Above code works but can it be optimized or shortened since I see that functions such as match_all, find_all, find_each etc are also available?
(Not able to put hashes for code in above text).
2. Re: Optimizing multiple searches
- Posted by ghaberek (admin) Jun 28, 2017
- 1347 views
(Not able to put hashes for code in above text).
Tip: use the <eucode> tag for Euphoria code and {{{ / }}} for all other formatted text (e.g. console output). See CreoleHelp for more info.
-Greg
3. Re: Optimizing multiple searches
- Posted by ghaberek (admin) Jun 28, 2017
- 1347 views
I have 2 lists (lines and terms) and I have to select out only those entries of 'lines' where all items of 'terms' are present. Above code works but can it be optimized or shortened since I see that functions such as match_all, find_all, find_each etc are also available?
Since you require all terms to match, then you can jump out of the loop of terms as soon as you don't find one you're looking for. This is easier if you invert the for loops: loop through all of the lines, scanning each one for the required terms.
include std/console.e include std/types.e sequence lines = { "this is first line", "second line", "another line", "and this is another line- will be fourth", "and last line" } sequence terms = {"line", "will", "this"} sequence outlist = {} for i = 1 to length( lines ) do integer found_all = TRUE for j = 1 to length( terms ) do if not match( terms[j], lines[i] ) then found_all = FALSE break end if end for if found_all then outlist = append( outlist, lines[i] ) end if end for display( outlist )
-Greg
4. Re: Optimizing multiple searches
- Posted by Spock Jun 28, 2017
- 1361 views
I have 2 lists (lines and terms) and I have to select out only those entries of 'lines' where all items of 'terms' are present. Above code works but can it be optimized or shortened since I see that functions such as match_all, find_all, find_each etc are also available?
(Not able to put hashes ## for code in above text).
Although there is a match_any() function in Euphoria it is not of the kind that works in this case. If there were you could've shortened the loop to:
for i=1 to length(lines) do if match_any(terms, lines.i) = length(terms) then outlist &= { lines.i } end if end for
You can easily write your own function, eg:
function match_any(sequence terms, sequence target) sequence ret = {} for i=1 to length(terms) do integer n = match(terms.i, target) if n then ret &= n end if end for return ret end function
But the first thing to ask really is what are the data set sizes and what optimization (and other programming) goals do you have? Maybe when you say "optimize" you really mean beautify? Or is it lowest number of lines of code? Or is it maximum performance? There are compromises or trade-offs to be made in every choice. But the very first goal you must strive for is, of course, correctness. If it works correctly and you can live with the performance.. you're done!
Spock
5. Re: Optimizing multiple searches
- Posted by petelomax Jun 28, 2017
- 1319 views
Not impressed when looking at the manual...
http://openeuphoria.org/docs/std_search.html#_2480_match_all said...
public function match_all(sequence needle, sequence haystack, integer start = 1)
Match all items of haystack in needle.
Is it me or is that just completely backwards? It should perhaps say:
Locate all instances(/subsequences) of needle in haystack.
The example from the above link may clarify my wording change for the manual, but also shows that match_all() is not at all what is need for the task as asked for above:
s = match_all("the", "the dog chased the cat under the table.") -- s is {1,16,30}
Anyway, going back to the original question, here's my suggestion
sequence lines = { "this is first line", "second line", "another line", "and this is another line- will be fourth", "and last line" } sequence terms = {"line", "will", "this"} function match_alls(sequence terms, sequence lines) integer k = 0 for i=1 to length(lines) do for j=1 to length(terms) do if match(terms[j],lines[i])=0 then exit end if if j=length(terms) then k += 1 lines[k] = lines[i] end if end for end for return lines[1..k] end function ?match_alls(terms,lines)
Output
{"and this is another line- will be fourth"}(you may want to use display() instead of ?)
To get Spock's suggestion to work as I thought it should, and match my output, I had to change
if match_any(terms, lines[i]) = length(terms) then
to
if length(match_any(terms, lines[i])) = length(terms) then
Pete
PS: Spock asked some questions that need answering (I would have kept schtum were it not for the manual thing)
6. Re: Optimizing multiple searches
- Posted by Spock Jun 28, 2017
- 1341 views
To get Spock's suggestion to work as I thought it should, and match my output, I had to change [it] to
if length(match_any(terms, lines[i])) = length(terms) then
Pete
Spock
7. Re: Optimizing multiple searches
- Posted by _tom (admin) Jun 29, 2017
- 1261 views
search.e library takes some untangling before use.
Always look to the idea that Euphoria divides options into two: atom|sequence, one|many, 0|1 .
- You are searching for something in a sequence or a string sequence.
- An item is "usualy often an atom, but could be an object viewed as one whole."
- An {items list} is "several items that could satisfy the search."
- A slice is "a contiguous sequence that is part of an equal or longer sequence."
- A {slices list} is "several slices that could satisfy the search."
- A "new" sequence is "returned after a search and replace action."
find
A find function "locates an item in a list."
input | search result | use | |
---|---|---|---|
item | 0|1 | is_in_list | is_in_range |
index | find | rfind | |
{ indices } | find_all | find_all_but | |
"new" | find_replace | ||
|
|
|
|
{ items list } | 0|1 | match_any | |
index | find_any | ||
{ indices } | find_each |
Warning: the match_any function does not fit the naming pattern.
match
A match function "locates a slice in a list."
input | search result | use | |
---|---|---|---|
slice | 0|1 | begins | ends |
index | match | ||
{ indices } | match_all | ||
"new" | match_replace |
Note: for {slices list} you have to write nested loops.
Wanted Search Libray
An improved std/search.e would not have any conflicts with the std/regex.e library.
A lowercase search function ( find, match ) uses one item as a search argument. An uppercase search function ( Find, Match ) uses a list of items as a search argument.
find
A find function "searches for an item in a list."
A Find function "searches for {items} in a list."
input | search result | use | |
---|---|---|---|
item | 0|1 | is_in_list | is_in_range |
index | find | rfind | |
{ indices } | find_all | find_all_but | |
"new" | find_replace | ||
|
|
|
|
{ items list } | 0|1 | ||
index | Find | ||
{ indices } | Find_all |
A match function "searches for a slice in a list." A Match function "searches for {slices} in a list."
input | search result | use | |
---|---|---|---|
slice | 0|1 | begins | ends |
index | match | ||
{ indices } | match_all | ||
"new" | match_replace | ||
|
|
|
|
{ items list } | 0|1 | ||
index | Match | ||
{ indices } | Match_all |
_tom
8. Re: Optimizing multiple searches
- Posted by _tom (admin) Jun 29, 2017
- 1306 views
search.e library takes some untangling before use.
Always look to the idea that Euphoria divides options into two: atom|sequence, one|many, 0|1 .
- You are searching for something in a sequence or a string sequence.
- An item is "usualy often an atom, but could be an object viewed as one whole."
- An {items list} is "several items that could satisfy the search."
- A slice is "a contiguous sequence that is part of an equal or longer sequence."
- A {slices list} is "several slices that could satisfy the search."
- A "new" sequence is "returned after a search and replace action."
find
A find function "locates an item in a list."
input | search result | use | |
---|---|---|---|
item | 0|1 | is_in_list | is_in_range |
index | find | rfind | |
{ indices } | find_all | find_all_but | |
"new" | find_replace | ||
|
|
|
|
{ items list } | 0|1 | match_any | |
index | find_any | ||
{ indices } | find_each |
Warning: the match_any function does not fit the naming pattern.
match
A match function "locates a slice in a list."
input | search result | use | |
---|---|---|---|
slice | 0|1 | begins | ends |
index | match | ||
{ indices } | match_all | ||
"new" | match_replace |
Note: for {slices list} you have to write nested loops.
Wanted Search Libray
An improved std/search.e would not have any conflicts with the std/regex.e library.
A lowercase search function ( find, match ) uses one item as a search argument. An uppercase search function ( Find, Match ) uses a list of items as a search argument.
find
A find function "searches for an item in a list."
A Find function "searches for {items} in a list."
input | search result | use | |
---|---|---|---|
item | 0|1 | is_in_list | is_in_range |
index | find | rfind | |
{ indices } | find_all | find_all_but | |
"new" | find_replace | ||
|
|
|
|
{ items list } | 0|1 | ||
index | Find | ||
{ indices } | Find_all |
A match function "searches for a slice in a list." A Match function "searches for {slices} in a list."
input | search result | use | |
---|---|---|---|
slice | 0|1 | begins | ends |
index | match | ||
{ indices } | match_all | ||
"new" | match_replace | ||
|
|
|
|
{ items list } | 0|1 | ||
index | Match | ||
{ indices } | Match_all |
_tom