1. pbr

Here is the basic case for pbr:
sequence taba
procedure set50a()
    taba[50]+=1
end procedure

function set50b(sequence tabb)
    tabb[50]+=1
    return tabb
end function
atom t
t=time()
taba=repeat(0,5000)
for i=1 to 10000 do
    set50a()
end for
t=time()-t
printf(1,"static var method(a):%3.2f seconds\n",t)
t=time()
taba=repeat(0,5000)
for i=1 to 10000 do
    taba=set50b(taba)
end for
t=time()-t
printf(1,"dynamic var method(b):%3.2f seconds\n",t)

The first routine, set50a, has a critical limitation: it always acts on taba.
The second method, set50b, is passed a parameter, returns a result, and that
is stored on return, every iteration, so even naievely we'd think this extra
flexibility has some penalty, maybe 4 or 5 times slower. But it is thousands
of times slower, because the different values of taba and tabb must co-exist
at the same moment in time, meaning every iteration has to also make a 5000
element copy, and it will get exponentially worse as tables grow.

Fundamentally, the two routines are doing the same job, only the second is
much more flexible because it can be applied to any table. But you'd need 
some form of pbr to get performances on the same planet.

True, you may never /really/ need pbr, but you can end up with some very 
messy code, typically littered with subscripts, to avoid it.


Matt:
My thoughts on pbr, which you may not like, went as follows:

First and foremost I'm quite rabidly anti-pbr on procedure calls. I want to
see something being assigned to on the LHS of an = (or dot) sign, thank you.
I can (happily) live with notions such as customer.phone=1234 is just shorthand
for customer=CustType:phone(customer,1234), say, and that a "method" in a class
is just a function without an explict "return this" statement, and that
cust.addCredit(100) is really the same as cust=addCredit(cust,100). Anyway, I
digress (as usual).

What you really want to do in this, or syntactic sugar variants of it:
t=f(t)

is temporarily make t unassigned, the value genuinely has a reference count
of 1 over the call (and equally if t is 2nd, 3rd, etc param, but only for 
the first[or last] "copy"). However, if t is defined before f, then it would 
break existing code if f() accessed the file-level variable in any way. This
extends to all static variables via (forward) routine_id calls etc.

Now, if you restrict pbr to local vars (or parameters), then f() /cannot/
possibly get to see (the same instance of) t before it returns, not even
if it performs mutual recursion with the callee.

It is actually (imo) not unreasonable to suggest:
procedure pbrify()
sequence table2
    table2=table
    table=0 -- allow pbr
    table2=f(table2)
    table=table2
end procedure

Of course that is a lot more (inexpensive) code, but the number of cases where
you need to force pbr like this should be few and far between. The point is it is
now absolutely in your face why referencing "table" in f() would crash, not some
wierd thing that the compiler does behind your back, which it is now/still doing
to table2, but precisely because that is local, your code cannot possibly
reference it again until the function has returned.

Legacy code will of course work unaltered, but without the mod above for
file-level vars then without any performance gain from pbr. (obviously if you've
just done a "tablecopy=table" before calling pbrify(), then this will not help at
all as the data still has a ref count of 2, reduced from 3)

Of course it is a bit of a kicker that you cannot do proper pbr automagically
on file-level variables, but at least it is possible manually, and of course
you can easily pbr any table element in a very similar manner to the above.
I really think it is the right way to go now.


It may also be posssible to use a special value for table2 over the call, I'm
thinking #40000001 and showing it in ex.err as <pbr-optimised>, because for 
the same reason as above the back-end would never have to deal with it, only 
the ex.err write and possibly the trace() window.

If a user sees <pbr-optimised> in an ex.err file, then they should understand
that the data value logically associated with this variable, which may have 
been modified since it was passed, will have been shown somewhere earlier in
the ex.err file.

I must admit I am not entirely sure about t[i]=f(t[i]), and far far less so
about t[i][j]=f(t[i][j]). Maybe as above you just have to do it manually, 
though probably fine to automate that (assuming t is local, that is), and
no doubt much easier when syntatic sugar lets coder type t[i] only once!


Of course if you want multiple pbr parameters, under this scheme you also need
to support multiple return values. Maybe
function[3] f()
end function

to indicate that f() returns up to 3 results. Parsing return statements can
check if the next character is ',' and this number is not exceeded.
For consistency, I'm going to put first in symtab[F-1], where F is the symtab
entry for the routine code and is where I normally put a function return, the
next in [F-2], etc, whereas eu.ex I think puts function result in symtab[F][1]
but I think you'd be wise to create extra symtab slots for multiple returns.

I think I was going to go with:
{a,b,c}=f() -- multiple returns only, matches return 1,2,3 statement
{,,a}=f()   --    "        "    only want 3rd value
{}=f()      -- ignore function result(s)
{a,b,c}@=0  -- as a=0, b=0, c=0
{a,b,c}#={1,2,3} -- as a=1 b=2 c=3 (may have seq. creation overhead)


Multiple returns aren't high on my to-do list, though.

Regards, and erm, phew!
Pete

new topic     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu