1. Sequence size limits

Are there any internal limits to the size of a given sequence beyond machine ram limits?

new topic     » topic index » view message » categorize

2. Re: Sequence size limits

cp said...

Are there any internal limits to the size of a given sequence beyond machine ram limits?

The maximum number of elements is (2^N)-1, where N is the number of bits that a CPU register can hold. So in 32-bit systems this works out as 4,294,967,295 elements, and thus it would be more common to run out of memory before you reach that maximum.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Sequence size limits

DerekParnell said...
cp said...

Are there any internal limits to the size of a given sequence beyond machine ram limits?

The maximum number of elements is (2^N)-1, where N is the number of bits that a CPU register can hold. So in 32-bit systems this works out as 4,294,967,295 elements, and thus it would be more common to run out of memory before you reach that maximum.

In fact, the maximum size of a sequence will be the same when using 64-bit euphoria. Even there, it will be more likely to run out of memory than to hit the limit. A sequence requires a few bytes of overhead, plus 4 (or 8 on 64-bits) bytes for each element.

That means that with 32-bits euphoria, a sequence of maximal size would require something like an 8GB contiguous chunk of memory. For 64-bit euphoria (coming in 4.1), it would be 16GB.

Matt

new topic     » goto parent     » topic index » view message » categorize

4. Re: Sequence size limits

The practical limit would be imposed by the largest block of virtual address space that can be allocated. The address space is fixed at 2 GB with a 32 bit OS, independent of RAM size, but the largest block you could allocate might be half that. If RAM is short performance would suffer.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Sequence size limits

mattlewis said...
DerekParnell said...
cp said...

Are there any internal limits to the size of a given sequence beyond machine ram limits?

The maximum number of elements is (2^N)-1, where N is the number of bits that a CPU register can hold. So in 32-bit systems this works out as 4,294,967,295 elements, and thus it would be more common to run out of memory before you reach that maximum.

In fact, the maximum size of a sequence will be the same when using 64-bit euphoria. Even there, it will be more likely to run out of memory than to hit the limit. A sequence requires a few bytes of overhead, plus 4 (or 8 on 64-bits) bytes for each element.

That means that with 32-bits euphoria, a sequence of maximal size would require something like an 8GB contiguous chunk of memory. For 64-bit euphoria (coming in 4.1), it would be 16GB.

Matt

cp said...

Matt, On 64bit Euphoria, could the maximum number of sequence elements be increased to a 64bit integer?

new topic     » goto parent     » topic index » view message » categorize

6. Re: Sequence size limits

cp said...
mattlewis said...

In fact, the maximum size of a sequence will be the same when using 64-bit euphoria. Even there, it will be more likely to run out of memory than to hit the limit. A sequence requires a few bytes of overhead, plus 4 (or 8 on 64-bits) bytes for each element.

That means that with 32-bits euphoria, a sequence of maximal size would require something like an 8GB contiguous chunk of memory. For 64-bit euphoria (coming in 4.1), it would be 16GB.

Matt, On 64bit Euphoria, could the maximum number of sequence elements be increased to a 64bit integer?

We could change the length parameter to be a 64-bit integer, rather than a 32-bit integer. But I don't think that would accomplish anything other than increase the overhead for sequences.

Do you really need a sequence that large? I suspect that whatever you're planning to do with such a beast, there's a better way.

Matt

new topic     » goto parent     » topic index » view message » categorize

7. Re: Sequence size limits

mattlewis said...
cp said...
mattlewis said...

In fact, the maximum size of a sequence will be the same when using 64-bit euphoria. Even there, it will be more likely to run out of memory than to hit the limit. A sequence requires a few bytes of overhead, plus 4 (or 8 on 64-bits) bytes for each element.

That means that with 32-bits euphoria, a sequence of maximal size would require something like an 8GB contiguous chunk of memory. For 64-bit euphoria (coming in 4.1), it would be 16GB.

Matt, On 64bit Euphoria, could the maximum number of sequence elements be increased to a 64bit integer?

We could change the length parameter to be a 64-bit integer, rather than a 32-bit integer. But I don't think that would accomplish anything other than increase the overhead for sequences.

Do you really need a sequence that large? I suspect that whatever you're planning to do with such a beast, there's a better way.

Matt

Matt said earlier: "That means that with 32-bits euphoria, a sequence of maximal size would require something like an 8GB CONTIGUOUS chunk of memory. "

CP: That answer should be enough for anybody to realize that even 4 billion elements (as represented by 32 bits) are unrealistic, so what is the point in even considering 4 billion multiply by 4 billion elements? Utilizing the full 32 bits you will get a maximum 4 billion elements needing, as Matt explained, 8GB of CONTIGUOUS chunk of memory. If you utilized the next 8 bits of the the 64 bits i.e. a total of 40 bits only of your proposed 64 bit, you would need 2000GB i.e 2TB of CONTIGUOUS chunk of memory. Which computer is going to give you that memory?

new topic     » goto parent     » topic index » view message » categorize

8. Re: Sequence size limits

The requirement that memory be contiguous is not likely to be a problem on a 64 bit OS. It is important to understand that this is virtual address space, not RAM. Applications access only their own virtual address space and have no knowledge of what physical memory addresses they are accessing. RAM can be (and often is) heavily fragmented with no adverse effects. 64 bit Windows currently provides an 8 TB private virtual address space to each process that is completely independent of RAM size. This may be increased in future OS versions. Finding a 2 TB contiguous block should not be a problem.

In 32 bit Windows the default private virtual address space is 2 GB but can be increased to 3 GB with a change in boot configuration. Only compatible applications will see the change. I don't believe that Euphoria is. This too is independent of RAM size.

The above is only meant to provide a better understanding of the situation. I am not advocating a change in Euphoria implementation.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Sequence size limits

Larry: thanks for the useful info on memory.

Vinoba,Matt:
I'm accessing a specialized (not relational) 64bit database. The database has the "potential" for trillions of datavalues. However I don't know which values are non-zero until I retrieve them and I'd prefer to not have to check them one by one during retrieval since the retrieval via it's api would be slow using that method - instead I retrieve a large array (not trillions of course) of values in one shot and then filter out the zero values using Euphoria. Yes 2TB is well beyond what I would retrieve in a single shot, however there is a outside chance that I would need to get more than 4.2 billion with many values being zeros hence the possible need to "index" a sequence beyond 4 billion. I'm not saying I absolutely need it, just curious of the possibility at this point. I think Matt has answered that it is possible but I'd need a darn good reason for doing it, likely a special build so as to not impact standard 64bit build and provide the dev team with a substantial gift! Thank you for the feedback

new topic     » goto parent     » topic index » view message » categorize

10. Re: Sequence size limits

LarryMiller said...

The requirement that memory be contiguous is not likely to be a problem on a 64 bit OS. It is important to understand that this is virtual address space, not RAM. Applications access only their own virtual address space and have no knowledge of what physical memory addresses they are accessing. RAM can be (and often is) heavily fragmented with no adverse effects. 64 bit Windows currently provides an 8 TB private virtual address space to each process that is completely independent of RAM size. This may be increased in future OS versions. Finding a 2 TB contiguous block should not be a problem.

That's true, however, on the practical side, you'll still need enough RAM + swap to accommodate that. Also, with euphoria's method of reference counting and copy on write, you may need more than one copy at any given point.

cp said...

I'm accessing a specialized (not relational) 64bit database. The database has the "potential" for trillions of datavalues. However I don't know which values are non-zero until I retrieve them and I'd prefer to not have to check them one by one during retrieval since the retrieval via it's api would be slow using that method - instead I retrieve a large array (not trillions of course) of values in one shot and then filter out the zero values using Euphoria.

This sounds like you're using some NoSQL database. Do you really need to save all of the values? Seems like some sort of sparse data structure would be better. Can you effectively stream the results, filter and store only what you want? It's hard to say more without knowing more about the data itself.

A simple approach, of course, is to "page out" the data into multiple sequences. You could wrap the access into functions so that your code could use a "simple" index, while the wrapper would do the conversions automatically. So something like:

public function get_element( integer ix ) 
    -- NB: We're 64-bit, so integers are really BIG 
    integer page = and_bits( ix, 0xffffffff_00000000 ) / 0x1_00000000 
    integer index = and_bits( ix, 0xffffffff ) 
    return data[page][index] 
end function 

A useful question at this point seems to me to be: What do users of this DB do when they use the DB in other languages?

cp said...

Yes 2TB is well beyond what I would retrieve in a single shot, however there is a outside chance that I would need to get more than 4.2 billion with many values being zeros hence the possible need to "index" a sequence beyond 4 billion. I'm not saying I absolutely need it, just curious of the possibility at this point. I think Matt has answered that it is possible but I'd need a darn good reason for doing it, likely a special build so as to not impact standard 64bit build and provide the dev team with a substantial gift!

I'm not entirely certain how much work would be involved, but probably not too much. Obviously, you'd need to change the s1 structure, and probably at least some of the allocation routines. There may be some places still using "int" that manipulate sequence lengths. In my original 64-bit experiment last summer, I had upgraded the length to be stored as a 64-bit integer. To save a bit of space, and due to the reasons mentioned previously in this thread, I decided that was overkill.

Matt

new topic     » goto parent     » topic index » view message » categorize

11. Re: Sequence size limits

cp: Please consider this.

Euphoria currently allows 32 bit storage value to denote number of elements, i.e 2 raised to 32 elements. If each element was one byte long that would be 4 billion bytes.

However each element would likely be 4 bytes, i.e. 2 raised to 32 bytes - 16 GB If you have a 64 bit number storing the total elements, it will be 2 raised to 64 number of elements. with each element requiring 8 bytes, it would be 2 raised to 70 number of bytes. That is a zibibyte (zettabytes). The maximum contiguous block on a 64 bit machine has been stated above to be

8 Terrabytes i.e. 2 raised to 43.

Now if you consider the statement made by Matt that euphoria needs to keep a temporary copy, then the available bytes to you would reduce to 4 terrabytes. If you are doing any kind of operation on it, there is a likelihood of a result being stored, reducing the availability to you of 2 terrabytes, i.e. 2 raised to 33. So effectively, by going the 64 bit route, you will be getting a storage number representing the number of elements as only 8 billion, and only one more bit will be used if a new system with 64 bits representing size of sequence is implemented. Hardly worth the effort.

I will not even consider the situation where there are strings store in a sequence, string such as "32 Carablanca Street", each element requiring not 64 bits (8 bytes) but 20 or 30 bytes.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu