Re: Sequence size limits

new topic     » goto parent     » topic index » view thread      » older message » newer message
LarryMiller said...

The requirement that memory be contiguous is not likely to be a problem on a 64 bit OS. It is important to understand that this is virtual address space, not RAM. Applications access only their own virtual address space and have no knowledge of what physical memory addresses they are accessing. RAM can be (and often is) heavily fragmented with no adverse effects. 64 bit Windows currently provides an 8 TB private virtual address space to each process that is completely independent of RAM size. This may be increased in future OS versions. Finding a 2 TB contiguous block should not be a problem.

That's true, however, on the practical side, you'll still need enough RAM + swap to accommodate that. Also, with euphoria's method of reference counting and copy on write, you may need more than one copy at any given point.

cp said...

I'm accessing a specialized (not relational) 64bit database. The database has the "potential" for trillions of datavalues. However I don't know which values are non-zero until I retrieve them and I'd prefer to not have to check them one by one during retrieval since the retrieval via it's api would be slow using that method - instead I retrieve a large array (not trillions of course) of values in one shot and then filter out the zero values using Euphoria.

This sounds like you're using some NoSQL database. Do you really need to save all of the values? Seems like some sort of sparse data structure would be better. Can you effectively stream the results, filter and store only what you want? It's hard to say more without knowing more about the data itself.

A simple approach, of course, is to "page out" the data into multiple sequences. You could wrap the access into functions so that your code could use a "simple" index, while the wrapper would do the conversions automatically. So something like:

public function get_element( integer ix ) 
    -- NB: We're 64-bit, so integers are really BIG 
    integer page = and_bits( ix, 0xffffffff_00000000 ) / 0x1_00000000 
    integer index = and_bits( ix, 0xffffffff ) 
    return data[page][index] 
end function 

A useful question at this point seems to me to be: What do users of this DB do when they use the DB in other languages?

cp said...

Yes 2TB is well beyond what I would retrieve in a single shot, however there is a outside chance that I would need to get more than 4.2 billion with many values being zeros hence the possible need to "index" a sequence beyond 4 billion. I'm not saying I absolutely need it, just curious of the possibility at this point. I think Matt has answered that it is possible but I'd need a darn good reason for doing it, likely a special build so as to not impact standard 64bit build and provide the dev team with a substantial gift!

I'm not entirely certain how much work would be involved, but probably not too much. Obviously, you'd need to change the s1 structure, and probably at least some of the allocation routines. There may be some places still using "int" that manipulate sequence lengths. In my original 64-bit experiment last summer, I had upgraded the length to be stored as a 64-bit integer. To save a bit of space, and due to the reasons mentioned previously in this thread, I decided that was overkill.

Matt

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu