1. How to convert sequence of characters to numbers

I need to convert a date/time string (sequence) of the format "20121018123010" into a sequence of numeric values of the format {2012,10,18,12,30,10} I'm looking for the fastest routine execution time possible. It looks like I could use breakup() and then maybe to_integer() or value() on each of the six elements. Am I missing something or is there a better way. Ultimately I need to poke2 each of the word values into a c_func call

Thanks, casey

new topic     » topic index » view message » categorize

2. Re: How to convert sequence of characters to numbers

casey said...

I need to convert a date/time string (sequence) of the format "20121018123010" into a sequence of numeric values of the format {2012,10,18,12,30,10} I'm looking for the fastest routine execution time possible. It looks like I could use breakup() and then maybe to_integer() or value() on each of the six elements. Am I missing something or is there a better way. Ultimately I need to poke2 each of the word values into a c_func call

Thanks, casey

Hi Casey,

I imagine that the fastest way would be to poke the string into memory and call a custom machine code routine that did the conversion AND called your C function.

However, if you wanted to use Euphoria then a custom routine would be fastest. You would not have to call breakup() or any other Eu functions as you could address and convert each element directly:

 
sequence s = "20121018123010" 
s -= '0' 
integer year = s[1]*1000 + s[2]*100 + s[3]*10 + s[4] -- year is now 2012 

I think you get the idea.

Spock

new topic     » goto parent     » topic index » view message » categorize

3. Re: How to convert sequence of characters to numbers

Spock said...
casey said...

I need to convert a date/time string (sequence) of the format "20121018123010" into a sequence of numeric values of the format {2012,10,18,12,30,10} I'm looking for the fastest routine execution time possible. It looks like I could use breakup() and then maybe to_integer() or value() on each of the six elements. Am I missing something or is there a better way. Ultimately I need to poke2 each of the word values into a c_func call

Thanks, casey

Hi Casey,

I imagine that the fastest way would be to poke the string into memory and call a custom machine code routine that did the conversion AND called your C function.

However, if you wanted to use Euphoria then a custom routine would be fastest. You would not have to call breakup() or any other Eu functions as you could address and convert each element directly:

 
sequence s = "20121018123010" 
s -= '0' 
integer year = s[1]*1000 + s[2]*100 + s[3]*10 + s[4] -- year is now 2012 

I think you get the idea.

Spock


Umm, s[1] is '2' , not 2, and '2' = 50, 2 = 2. Likewise s[1..4] = "2012".

I think you get the idea.

useless

new topic     » goto parent     » topic index » view message » categorize

4. Re: How to convert sequence of characters to numbers

useless_ said...


Umm, s[1] is '2' , not 2, and '2' = 50, 2 = 2. Likewise s[1..4] = "2012".

I think you get the idea.

useless

Yes, and '0' is 48.

The expression "20121018123010" - '0' is equivalent to {50, 48, 49, 50, 49, 48, 49, 56, 49, 50, 51, 48, 49, 48} - 48 which equals {2, 0, 1, 2, 1, 0, 1, 8, 1, 2, 3, 0, 1, 0}

Now s[1..4] is {2, 0, 1, 2} which multiplied out gives you the year as an integer.

Edit: Corrected conversion error

new topic     » goto parent     » topic index » view message » categorize

5. Re: How to convert sequence of characters to numbers

Thanks very much. Your solution is simpler and indeed 2x+ faster than calling breakup() and then to_number().

Casey

new topic     » goto parent     » topic index » view message » categorize

6. Re: How to convert sequence of characters to numbers

jaygade said...
useless_ said...


Umm, s[1] is '2' , not 2, and '2' = 50, 2 = 2. Likewise s[1..4] = "2012".

I think you get the idea.

useless

Yes, and '0' is 48.

The expression "20121018123010" - '0' is equivalent to {50, 48, 49, 50, 49, 48, 49, 56, 49, 50, 51, 48, 49, 48} - 48 which equals {2, 0, 1, 2, 1, 0, 1, 8, 1, 2, 3, 0, 1, 0}

Now s[1..4] is {2, 0, 1, 2} which multiplied out gives you the year as an integer.

Edit: Corrected conversion error


Wierd, i had to look twice again to see his "s -= '0'" line, but i saw your "- 48" code immeadiately. My mistake, sorry.

EDIT:
I'd like to blame this on the font. The other day, Derek used what looked like a lower case 'o' in some math, and i copy/pasted it to a text editor which displays a greater difference between 0, O, and o. I'd have typed it just as you did, using 48 instead of '0'.

useless

new topic     » goto parent     » topic index » view message » categorize

7. Re: How to convert sequence of characters to numbers

casey said...

I need to convert a date/time string (sequence) of the format "20121018123010" into a sequence of numeric values of the format {2012,10,18,12,30,10} I'm looking for the fastest routine execution time possible. It looks like I could use breakup() and then maybe to_integer() or value() on each of the six elements. Am I missing something or is there a better way. Ultimately I need to poke2 each of the word values into a c_func call

Obligatory XKCD reference: Regular Expressions

include "std/convert.e" 
include "std/regex.e" 
 
regex pattern = regex:new( "([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})" ) 
 
function get_parts( sequence string ) 
 
    sequence parts = {} 
 
    if regex:is_match( pattern, string ) then 
 
        object matches = regex:matches( pattern, string ) 
 
        parts = repeat( 0, length(matches)-1 ) 
        for i = 1 to length(matches)-1 do 
            parts[i] = to_integer( matches[i+1] ) 
        end for 
 
    end if 
 
    return parts 
end function 

Example:

sequence string = "20121018123010" 
sequence parts = get_parts( string ) 
 
printf( 1, "string = \"%s\"\n", {string} ) 
puts( 1, "parts = " ) ? parts 

Output:

$ eui regex-test.ex  
string = "20121018123010" 
parts = {2012,10,18,12,30,10} 

-Greg

new topic     » goto parent     » topic index » view message » categorize

8. Re: How to convert sequence of characters to numbers

Right, but is it fast. That was one of the OP's original criteria.

new topic     » goto parent     » topic index » view message » categorize

9. Re: How to convert sequence of characters to numbers

jaygade said...

Right, but is it fast. That was one of the OP's original criteria.

Dang it, you're right. I just did some tests on over 500,000 iterations; the method proposed by you, Spock, and Kat is over 25 times faster than using my regular expression. sad

However, using a regular expression makes it easier to:

  • quickly parse the 'number' out of a string or text file
  • validate the input on one pass: it either matches or it doesn't
  • allow for any number of variations and/or strictness in the input, e.g.
    • 'year' can only be 1970-2012
    • 'month' can only be 01-12
    • 'day' can only be 01-31
    • 'hour' can only be 00-23
    • 'minute' and 'second' can only be 00-59

Here is the function I used against mine for testing...

function get_parts( sequence string ) 
     
    string -= '0' 
     
    sequence parts = repeat( 0, 6 ) 
    parts[1] = (string[1] * 1000) + (string[2] * 100) + (string[3] * 10) + string[4] 
    parts[2] = (string[5] * 10) + string[6] 
    parts[3] = (string[7] * 10) + string[8] 
    parts[4] = (string[9] * 10) + string[10] 
    parts[5] = (string[11] * 10) + string[12] 
    parts[6] = (string[13] * 10) + string[14] 
     
    return parts 
end function 

-Greg

new topic     » goto parent     » topic index » view message » categorize

10. Re: How to convert sequence of characters to numbers

ghaberek said...

the method proposed by you, Spock, and Kat is over 25 times faster than my regular expression. sad

Ha

ghaberek said...

However, using a regular expression makes it easier to:

Uh

ghaberek said...
  • allow for any number of variations and/or strictness in the input, e.g.
    • 'year' can only be 1970-2012
    • 'month' can only be 01-12
    • 'day' can only be 01-31
    • 'hour' can only be 00-23
    • 'minute' and 'second' can only be 00-59

I'll believe that is a simple and easy regular expression when I see it and not before.

Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu