1. Unregistered profile_time() (was: Re: copying global stuff to one file

>> > i want to release a cool library i made which profiles code. here's an
>> > example output:
>> 
>> Okay, are you planning to do this using the profile_time function
>> within registered Euphoria?
> 
> No.
> I have already written this profile library. It simply measures how much time
> was spent in a block of code and how many times it was run. Ofcourse I have to
> put two procedures around the block of code I want to profile, a little more work
> than using with profile, but there are some other advantages (besides i made me a
> macro in msdev which automatically puts this code in, so it's not much trouble):
> 

Does it work under Linux and al. as well?

- Under DOS32, you can trap int#70 and sort CMOS clocks ticks out.
- Under WIN32, there's a couple of performance API functions that work just 
like time(), almost as easy. It's precise up to the microsecond, may not be 
functional on 486- machines.
- Under Linux/BSD, I don't know.

I can't understand some earlier posts on this list about profile_time() not 
available in Win32, among which one from RC. I'm using my own start() and 
stop() precision stopwarch to profile_time() a possible next submission for 
the contest. And I didn't reverse engineer anything for that purpose, would 
take too much trouble, and am not using modified source of any kind.

CChris

new topic     » topic index » view message » categorize

2. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

Christian Cuvier wrote:
> 
> >> > i want to release a cool library i made which profiles code. here's an
> >> > example output:
> >> 
> >> Okay, are you planning to do this using the profile_time function
> >> within registered Euphoria?
> > 
> > No.
> > I have already written this profile library. It simply measures how much
> > time was spent in a
> block of code and how many times it was run. Ofcourse I have to put two
> procedures around the block of code I want to profile,
> a little more work than using with profile, but there are some other
> advantages (besides i made me a macro in msdev which
> automatically puts this code in, so it's not much trouble):</font></i>
> > 
> 
> Does it work under Linux and al. as well?

It is generic. It uses time(). What's al.?

> - Under DOS32, you can trap int#70 and sort CMOS clocks ticks out.
> - Under WIN32, there's a couple of performance API functions that work just 
> like time(), almost as easy. It's precise up to the microsecond, may not be 
> functional on 486- machines.
> - Under Linux/BSD, I don't know.

time() is precise enough for me.


> 
> I can't understand some earlier posts on this list about profile_time() not 
> available in Win32, among which one from RC. I'm using my own start() and 
> stop() precision stopwarch to profile_time() a possible next submission for 
> the contest. And I didn't reverse engineer anything for that purpose, would 
> take too much trouble, and am not using modified source of any kind.

I don't know about that, haven't tried profile_time for a long time, because it
is not available in unregistered version.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

>>Does it work under Linux and al. as well?
> 
> 
> It is generic. It uses time(). 

Ok, I had understood otherwise from your previous post. When you want to 
time() portions of code that are executed a large number of times, but are not 
contiguous, time()'s resolution starts looking quite coarse.

> What's al.?
> 

al. short for "alia". Read: Linux or other more or less Unix-like OSes I don't 
know enough about.

CChris

new topic     » goto parent     » topic index » view message » categorize

4. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

Christian Cuvier wrote:
> 
> >>Does it work under Linux and al. as well?
> > 
> > 
> > It is generic. It uses time(). 
> 
> Ok, I had understood otherwise from your previous post. When you want to 
> time() portions of code that are executed a large number of times, but are not
>
> contiguous, time()'s resolution starts looking quite coarse.
Especially on newer computers, I used time to measure performance
in CJBN server on the old 500Mhz, 256MB RAM system I have, but it alway times
0 on my new 28Ghz 2GB RAM 64-bit laptop...
Is there a more accurate library somwere(I only need windows and maybe
linux, since I rarely use DOSe anymore)?

> 
> > What's al.?
> > 
> 
> al. short for "alia". Read: Linux or other more or less Unix-like OSes I don't
>
> know enough about.
> 
> CChris
> 
>

new topic     » goto parent     » topic index » view message » categorize

5. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

> Subject: Re: Unregistered profile_time() (was: Re: copying global stuff to one
> file
> 
> 
> posted by: CoJaBo <cojabo at suscom.net>
> 
> Christian Cuvier wrote:
> 
>>> 
>>>>> >>Does it work under Linux and al. as well?
>>>
>>>> > 
>>>> > It is generic. It uses time(). 
>>
>>> 
>>> Ok, I had understood otherwise from your previous post. When you want to 
>>> time() portions of code that are executed a large number of times, but are
>>> not
>>> contiguous, time()'s resolution starts looking quite coarse.
> 
> Especially on newer computers, I used time to measure performance
> in CJBN server on the old 500Mhz, 256MB RAM system I have, but it alway times
> 0 on my new 28Ghz 2GB RAM 64-bit laptop...
> Is there a more accurate library somwere(I only need windows and maybe
> linux, since I rarely use DOSe anymore)?
> 

Here's what I use:

constant k32=open_dll("kernel32.dll"),
qpf=define_c_func(k32,"QueryPerformanceFrequency",{C_UINT},C_INT},
qpc=define_c_proc(k32,"QueryPerformanceCounter",{C_UINT}}
--the latter is a function actually, but we don't care about the boolean
--status code it returns

if k32=-1 then --you're in trouble, or under Linux, or...
elsif qpf=-1 or qpc=-1 then --your Windows version doesn't support this
end if

constant p232=power(2,32)

--helper function to retrieve results
function int64ptr_to_atom(atom ptr)
sequence s
s=peek4s({ptr,2})
return p232*s[2]+s[1]
end function

constant timeFactorPtr=allocate(8),
          timeRC=c_func(qpf,{timeFactorPtr})
if timeRC=0 then --your hardware doesn't support hi-res timers
end if
--at this point, the timer is functional
constant timeFactor=int64ptr_to_atom(timeFactorPtr)
--counts are given in ticks, and there are timeFactor ticks per second

--now some variables
constant maxSections=6  --whatever positive integer suits you
sequence perfptr,total,times,started
perfptr=repeat(0,maxSections)
total=perfptr  --total execution time
times=total    --number of runs
started=times  --flags
for i=1 to maxSections do perfptr[i]=allocate(16) end for
--each structure will store two pairs of integers, for a total of 4*4=16 bytes

--ok, now the two procedures that start/end a timed section

procedure start(integer section)
c_proc(qpc,perfptr[section])
started[section]=1
end procedure

procedure stop(integer section)
c_proc(qpc,perfptr[section]+8)
if not started[section] then return end if  --start time not valid
started[section]=0
times[section]+=1
total[section] +=
(int64ptr_to_atom(perfptr[section]+8)-int64ptr_to_atom(perfptr[section]))/timeFactor
end procedure

--not sure you gain any real precision (in theory, you do) by keeping
--total[section] as a number of counts rather than an actual time.
--And you'd have to implement addition for int64s using the 31-bit Eu integer 
type,
--- not the best idea I'd think of.
--however: 1/ atoms don't lose arithmetic accuracy whileless than power(2,53);
--you can implement addition for int64s:

constant p229=power(2,29)  --largest Eu-integer power of 2
type int32(atom x)
return integer(remainder(x,p229))
end type

type int64(object x)
return integer(x) or
(sequence(x) and length(x)=2 and int32(x[1]) and int32(x[2]))
end type

function addandwrap(int64 x,int64 y)
sequence hibitsx,hibitsy
hibitsx=floor(x/p229)
hibitsy=floor(y/p229)
y=remainder(y,p229)+remainder(x,p229)
hibitsy+=hibits(x)
hibitsy+=floor(y/p229)
y=remainder(y,p229)
hibitsy[2]+=floor(hibitsy[1]/8)      --8=p232/p229
hibitsy[2]=remainder(hibitsy[2],8)   --there's the wrap,
hibitsy[1]=remainder(hibitsy[1],8)
return y+p229*hibitsy
end function
--this replaces a pair of machine code instructions that give you a wrap flag 
--in CF as a bonus <sigh and shudder>

--your code there, with all the start() and stop()
--your output routines there, to retrieve and inspect the results


That's all it takes. profile() and profile_time() under Windows, both for free.

You can be quite creative, as a section may have several start points and/or 
several end points, or may start after it stops (in which case you'll miss one 
run out of a zillion iterations).

Forgot to say that, to watch a section, you must insert a start() statement 
before each starting statement, and a stop() after each ending statement. That 
means one of each kind per section most of the time, but... see previous
comment.

Enjoy!

CChris

new topic     » goto parent     » topic index » view message » categorize

6. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

Christian Cuvier wrote:

> Here's what I use:
> 
> }}}
<eucode>
> constant k32=open_dll("kernel32.dll"),
> qpf=define_c_func(k32,"QueryPerformanceFrequency",{C_UINT},C_INT},
> qpc=define_c_proc(k32,"QueryPerformanceCounter",{C_UINT}}
> --the latter is a function actually, but we don't care about the boolean
> --status code it returns
> 
> if k32=-1 then --you're in trouble, or under Linux, or...
> elsif qpf=-1 or qpc=-1 then --your Windows version doesn't support this
> end if
> 
> constant p232=power(2,32)
> 
> --helper function to retrieve results
> function int64ptr_to_atom(atom ptr)
> sequence s
> s=peek4s({ptr,2})
> return p232*s[2]+s[1]
> end function
> 
> constant timeFactorPtr=allocate(8),
>           timeRC=c_func(qpf,{timeFactorPtr})
> if timeRC=0 then --your hardware doesn't support hi-res timers
> end if
> --at this point, the timer is functional
> constant timeFactor=int64ptr_to_atom(timeFactorPtr)
> --counts are given in ticks, and there are timeFactor ticks per second
> 
> --now some variables
> constant maxSections=6  --whatever positive integer suits you
> sequence perfptr,total,times,started
> perfptr=repeat(0,maxSections)
> total=perfptr  --total execution time
> times=total    --number of runs
> started=times  --flags
> for i=1 to maxSections do perfptr[i]=allocate(16) end for
> --each structure will store two pairs of integers, for a total of 4*4=16 bytes
> 
> --ok, now the two procedures that start/end a timed section
> 
> procedure start(integer section)
> c_proc(qpc,perfptr[section])
> started[section]=1
> end procedure
> 
> procedure stop(integer section)
> c_proc(qpc,perfptr[section]+8)
> if not started[section] then return end if  --start time not valid
> started[section]=0
> times[section]+=1
> total[section] +=
>
> (int64ptr_to_atom(perfptr[section]+8)-int64ptr_to_atom(perfptr[section]))/timeFactor
> end procedure
> 
> --not sure you gain any real precision (in theory, you do) by keeping
> --total[section] as a number of counts rather than an actual time.
> --And you'd have to implement addition for int64s using the 31-bit Eu integer 
> type,
> --- not the best idea I'd think of.
> --however: 1/ atoms don't lose arithmetic accuracy whileless than power(2,53);
> --you can implement addition for int64s:
> 
> constant p229=power(2,29)  --largest Eu-integer power of 2
> type int32(atom x)
> return integer(remainder(x,p229))
> end type
> 
> type int64(object x)
> return integer(x) or
> (sequence(x) and length(x)=2 and int32(x[1]) and int32(x[2]))
> end type
> 
> function addandwrap(int64 x,int64 y)
> sequence hibitsx,hibitsy
> hibitsx=floor(x/p229)
> hibitsy=floor(y/p229)
> y=remainder(y,p229)+remainder(x,p229)
> hibitsy+=hibits(x)
> hibitsy+=floor(y/p229)
> y=remainder(y,p229)
> hibitsy[2]+=floor(hibitsy[1]/8)      --8=p232/p229
> hibitsy[2]=remainder(hibitsy[2],8)   --there's the wrap,
> hibitsy[1]=remainder(hibitsy[1],8)
> return y+p229*hibitsy
> end function
> --this replaces a pair of machine code instructions that give you a wrap flag 
> --in CF as a bonus <sigh and shudder>
> 
> --your code there, with all the start() and stop()
> --your output routines there, to retrieve and inspect the results
> </eucode>
{{{

> 
> That's all it takes. profile() and profile_time() under Windows, both for
> free.
> 
> You can be quite creative, as a section may have several start points and/or 
> several end points, or may start after it stops (in which case you'll miss one
>
> run out of a zillion iterations).
> 
> Forgot to say that, to watch a section, you must insert a start() statement 
> before each starting statement, and a stop() after each ending statement. That
>
> means one of each kind per section most of the time, but... see previous
> comment.
> 
> Enjoy!
> 
> CChris


i looks like time() returns only two decimal places precise.

so for example:
time():		0.560000
your win32 api:	0.5550946964

is this the only difference?

your code looks rather complex. isn't it possible to just make one wrapper
function timew32(), which would work exactly the same like time() but more
precise?
what does addandwrap() do?

new topic     » goto parent     » topic index » view message » categorize

7. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

I uploaded my profile lib. 
I copied almost all stuff from other of my include files to TSProfile.e. It was
500 lines, a lot.

http://www10.brinkster.com/tskoda/euphoria.asp#TSProfile

new topic     » goto parent     » topic index » view message » categorize

8. Re: Unregistered profile_time() (was: Re: copying global stuff to one file

Tone Škoda wrote:
> 
> Christian Cuvier wrote:
> 
[snip]
> > 
> > --not sure you gain any real precision (in theory, you do) by keeping
> > --total[section] as a number of counts rather than an actual time.
> > --And you'd have to implement addition for int64s using the 31-bit Eu
> > integer
> > type,
> > --- not the best idea I'd think of.
> > --however: 1/ atoms don't lose arithmetic accuracy whileless than
> > power(2,53);
> > --you can implement addition for int64s:
> > 
> > constant p229=power(2,29)  --largest Eu-integer power of 2
> > type int32(atom x)
> > return integer(remainder(x,p229))
> > end type
> > 
> > type int64(object x)
> > return integer(x) or
> > (sequence(x) and length(x)=2 and int32(x[1]) and int32(x[2]))
> > end type
> > 
> > function addandwrap(int64 x,int64 y)
> > sequence hibitsx,hibitsy
> > hibitsx=floor(x/p229)
> > hibitsy=floor(y/p229)
> > y=remainder(y,p229)+remainder(x,p229)
> > hibitsy+=hibits(x)
> > hibitsy+=floor(y/p229)
> > y=remainder(y,p229)
> > hibitsy[2]+=floor(hibitsy[1]/8)      --8=p232/p229
> > hibitsy[2]=remainder(hibitsy[2],8)   --there's the wrap,
> > hibitsy[1]=remainder(hibitsy[1],8)
> > return y+p229*hibitsy
> > end function
> > --this replaces a pair of machine code instructions that give you a wrap
> > flag
> > --in CF as a bonus <sigh and shudder>
> > 
> > --your code there, with all the start() and stop()
> > --your output routines there, to retrieve and inspect the results
> > </eucode>
{{{

> > 
> > That's all it takes. profile() and profile_time() under Windows, both for
> > free.
> > 
> > You can be quite creative, as a section may have several start points and/or
> >
> > several end points, or may start after it stops (in which case you'll miss
> > one
> > run out of a zillion iterations).
> > 
> > Forgot to say that, to watch a section, you must insert a start() statement 
> > before each starting statement, and a stop() after each ending statement.
> > That
> > means one of each kind per section most of the time, but... see previous
> > comment.
> > 
> > Enjoy!
> > 
> > CChris
> 
> 
> i looks like time() returns only two decimal places precise.
> 
> so for example:
> time():		0.560000
> your win32 api:	0.5550946964
> 
> is this the only difference?
> 

I have no idea. Rob, or anyone with access to the source code, 
could answer to that. 

> your code looks rather complex. isn't it possible to just make one wrapper
> function
> timew32(), which would work exactly the same like time() but more precise?

It is possible: 

constant hirestime=allocate(8),
         k32=open_dll("kernel32.dll"),
         qpf=define_c_func(k32,"QueryPerformanceFrequency",{C_UINT},C_INT},
         qpc=define_c_proc(k32,"QueryPerformanceCounter",{C_UINT}}
constant p232=power(2,32)

--helper function to retrieve results
function int64ptr_to_atom(atom ptr)
sequence s
s=peek4s({ptr,2})
return p232*s[2]+s[1]
end function
 
constant timeFactorPtr=allocate(8),
         timeRC=c_func(qpf,{timeFactorPtr}),
         timeFactor=int64ptr_to_atom(timeFactorPtr)

function timew32()
return int64ptr_to_atom(hirescounter)/timeFactor
end function


but I need the more sophisticated function for the sort of timing I'm 
performing, so I provided it.

> what does addandwrap() do?
> 

You may not have read the comments; try again please. addandwrap() is 
useful only if you're keen on keeping time as an long long integer rather 
than a decimal number of seconds. And it is far less elegant in Eu than in 
C (yes, this is rare enough to be mentioned), as I noted in these comments.
By the way, addandwrap() adds two large integers and tries to keep the result 
as an integer as much it can.

Regards
CChris

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu