1. Adding/Removing bytes from the beginning of a file

How to add/remove bytes from the beginning of a (binary) file?

Thanks

Green Euphorian

new topic     » topic index » view message » categorize

2. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

How to add/remove bytes from the beginning of a (binary) file?

Thanks

Green Euphorian

How about reading the whole file into a sequence.

Chop the front off the sequence and write the sequence back into the file.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Adding/Removing byes from the beginning of a file

BRyan said...
GreenEuphorian said...

How to add/remove bytes from the beginning of a (binary) file?

Thanks

Green Euphorian

How about reading the whole file into a sequence.

Chop the front off the sequence and write the sequence back into the file.

If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?

new topic     » goto parent     » topic index » view message » categorize

4. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...
BRyan said...
GreenEuphorian said...

How to add/remove bytes from the beginning of a (binary) file?

Thanks

Green Euphorian

How about reading the whole file into a sequence.

Chop the front off the sequence and write the sequence back into the file.

If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?

I haven't tried this, but you could probably use file_num = open("my_file", "u") to open text file for update (reading and writing), then copy data within the file a section at a time to limit memory useage, using seek, getc, putc, or other functions in http://openeuphoria.org/docs/std_io.html. I'm not sure how to reduce the size of the file, but i believe it grows if you write past the end of the file.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Adding/Removing byes from the beginning of a file

It might be better to seek to the desired starting position, then copy the rest of the file (a character or section at a time) to a new file, then delete the old file and rename the new one to the old name.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Adding/Removing byes from the beginning of a file

I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?

new topic     » goto parent     » topic index » view message » categorize

7. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?

I see a SetEndOfFile function on MSDN, but i can't find anything for setting the beginning of file.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Adding/Removing byes from the beginning of a file

I think large binary files are typically managed by creating a sort of file system inside the file. Part of the file has an allocation table or bitmap that maps out the data inside the file. Parts of the file would contain valid data and parts would be "empty". To delete data from anywhere in the file, mark that area as "empty". A cleanup or compact function would be called periodically to rebuild the file by defragmenting valid areas and removing excess empty areas, probably by copying all the data to a fresh new file and building a new allocation table.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Adding/Removing byes from the beginning of a file

ryanj said...

I think large binary files are typically managed by creating a sort of file system inside the file. Part of the file has an allocation table or bitmap that maps out the data inside the file. Parts of the file would contain valid data and parts would be "empty". To delete data from anywhere in the file, mark that area as "empty". A cleanup or compact function would be called periodically to rebuild the file by defragmenting valid areas and removing excess empty areas, probably by copying all the data to a fresh new file and building a new allocation table.

This is way too complicated and useless for what I need. In fact, I simply need to remove the "magic number" (file signature bytes) from certain files.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...
BRyan said...

How about reading the whole file into a sequence. Chop the front off the sequence and write the sequence back into the file.

If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?

I would have said much the same, but in blocks of anywhere between 8 and 256K. Actually, Eu hides much of this from you, so getc/putc is not as bad as it could be, and in fact quite reasonable performance-wise.

ryanj said...
GreenEuphorian said...

I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?

I see a SetEndOfFile function on MSDN, but i can't find anything for setting the beginning of file.

An equivalent imaginary "SetStartOfFile" function (from MSDN) would only work (without moving 100s GB) when the part you wanted to remove happened to be a whole file sector, or similar, so basically, no.

The suggestion I would make is rewrite the first few bytes with a special sequence meaning "ignore N bytes at start of file". Obviously that depends on who/what has to read it and whether you can modify that to obey said special sequence, however I would hazard that is a non-starter as if it were that simple you'd just modify the exact same things to skip whatever it is you want them to skip. Sorry.

Pete

new topic     » goto parent     » topic index » view message » categorize

11. Re: Adding/Removing byes from the beginning of a file

Thanks for the clarifications.

I'll try to follow Pete's advice about simply changing the first few bytes, without trimming them off. Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.

What is the relevant command that I would use to overwrite the first few bytes?

Thanks again

new topic     » goto parent     » topic index » view message » categorize

12. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

How to add/remove bytes from the beginning of a (binary) file?

You have to create a new file based on copying the original file, adding bytes or not copying bytes as required.

Sorry but there is no easy way out of this. Doing low-level filesystem manipulation is bound to be messy and dangerous; definitely not worth the effort.

If running on Windows, you could open the file in update mode, copying bytes from later locations to earlier locations (means remembering and setting current file position) and then when you have finished that, calling the API routine SetEndOfFile() in the kernel.dll.

new topic     » goto parent     » topic index » view message » categorize

13. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

Thanks for the clarifications.

I'll try to follow Pete's advice about simply changing the first few bytes, without trimming them off. Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.

What is the relevant command that I would use to overwrite the first few bytes?

Thanks again

This idea will work if its your own file layout design. If it is a standard or proprietary file type, you may run into problems doing this.

new topic     » goto parent     » topic index » view message » categorize

14. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

What is the relevant command that I would use to overwrite the first few bytes?

Erm, fairly straightforward I should think:

fn = open(filename,"ub") 
if seek(fn,0)!=SEEK_OK then ?9/0 end if -- (not strictly neccessary; and SEEK_OK is automatically defined as 0 in Phix) 
puts(fn,some_bytes) 
close(fn) 

Obviously, caveat emptor, untested, expect problems if you clobber more bytes than you should, as Derek said it may make the file unusable by other apps, etc.

Pete

PS: Another (random) thought occurs to me that if you don't have enough bytes (ignore this if you do), and you really don't want to move 100s of GB, a "get first 7 bytes from file 5" type scheme might help.
PPS: It also might be sensible to blat the first few bytes, do your thing, then restore those overwritten bytes.

new topic     » goto parent     » topic index » view message » categorize

15. Re: Adding/Removing byes from the beginning of a file

Thanks a lot. Only, I don't understand at all the line containing SEEK_OK. What is that about?!? Could you please explain it?

Thanks again

new topic     » goto parent     » topic index » view message » categorize

16. Re: Adding/Removing byes from the beginning of a file

Simply declare a struct called New_File consisting of 2 structures eg
File_Data and New_Data.
Then write your new data to the struct New_Data and the original file to File_Data.

Now write the structure New_File to disk and that's all.

Not that hard to come up with and really a piece of cake.

new topic     » goto parent     » topic index » view message » categorize

17. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

Thanks a lot. Only, I don't understand at all the line containing SEEK_OK. What is that about?!? Could you please explain it?

When you open a file (except in "a" mode) the file pointer should already be at the start of file, so you can omit the whole line if you want, and I probably should have done too. Or you might prefer

if seek(fn,0)!=0 then crash("couldn't seek to start of file!!") end if  -- (or log the error and return failure) 

seek() is a bit unusual in that it returns 0 on success. In Phix there is a predefined constant SEEK_OK with the value 0, in OpenEuphoria you can just use 0. ?9/0 is just my way of getting a crash when something goes horribly wrong.

I kind of assumed you would need to read a few bytes to see if there was something you needed to skip, after which you would need a seek.

HTH, Pete

new topic     » goto parent     » topic index » view message » categorize

18. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...

Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.

What about this?

new topic     » goto parent     » topic index » view message » categorize

19. Re: Adding/Removing byes from the beginning of a file

GreenEuphorian said...
GreenEuphorian said...

Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.

What about this?

Do you only want to truncate some leading bytes in a file (& modify a few more)? It might help to know what your use case is. Anyway, for performance, I would not alter the file length at all but simply write the desired start pointer to another small file and use that to seek() whenever you need to read the large file.

Also, modifying the other bytes is not difficult and doesn't require reading the whole file.

EDIT: However, prepending bytes could pose a performance issue. What is your use case?

Spock

new topic     » goto parent     » topic index » view message » categorize

20. Re: Adding/Removing bytes from the beginning of a file

Well, the initial idea when I started this thread was prepending bytes and removing bytes from a file, but from the answers I got I realised that this is not convenient at all, and may also lead to troubles.

So my current task is: overwriting the first few bytes of a file (containing the file signature, a.k.a. magic number, or simply header) and then, later on, overwriting them again, back to their original state. I already got the info about doing this, thanks to Pete.

Now, the only question unanswered is: will this simple byte-overwriting process (without any truncation or prepending) entail also a complete re-writing of the whole file on the disk when the file is updated? Or will those few bytes alone be modified on the disk (which indeed is my desired outcome)? There are significant performance issues involved, especially in the case of huge files. That's why I am asking.

Thanks again to all.

new topic     » goto parent     » topic index » view message » categorize

21. Re: Adding/Removing bytes from the beginning of a file

GreenEuphorian said...

will this simple byte-overwriting process (without any truncation or prepending) entail also a complete re-writing of the whole file on the disk when the file is updated? Or will those few bytes alone be modified on the disk (which indeed is my desired outcome)?

A combination of seek() and puts() will allow you to overwrite specific bytes without affecting the rest of the file.

Manual on I/O

Spock

[Edit: typo]

new topic     » goto parent     » topic index » view message » categorize

22. Re: Adding/Removing bytes from the beginning of a file

Spock said...

A combination of seek() and putc() will allow you to overwrite specific bytes without affecting the rest of the file.

putc() ?!? I could not find it in the manual. Or do you mean puts()?

new topic     » goto parent     » topic index » view message » categorize

23. Re: Adding/Removing bytes from the beginning of a file

GreenEuphorian said...
Spock said...

A combination of seek() and putc() will allow you to overwrite specific bytes without affecting the rest of the file.

putc() ?!? I could not find it in the manual. Or do you mean puts()?

Yeah, puts(). There's also put_integer16() and put_integer32() which do seem unneccesarily long...

Spock

new topic     » goto parent     » topic index » view message » categorize

24. Re: Adding/Removing bytes from the beginning of a file

True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.

new topic     » goto parent     » topic index » view message » categorize

25. Re: Adding/Removing bytes from the beginning of a file

SDPringle said...

True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.

This is exactly what I do (rewrite functions with the correct name). Incidentally, the main function in my own RegEx lib (written from scratch but loosely based on a User Contribution) is named match instead of find. It makes sense given that we are more concerned with matching a pattern rather than finding its location.

Spock

new topic     » goto parent     » topic index » view message » categorize

26. Re: Adding/Removing bytes from the beginning of a file

Spock said...
SDPringle said...

True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.

This is exactly what I do (rewrite functions with the correct name). Incidentally, the main function in my own RegEx lib (written from scratch but loosely based on a User Contribution) is named match instead of find. It makes sense given that we are more concerned with matching a pattern rather than finding its location.

Spock

It's interesting that you bring this up because I only recently discovered that there even was a "find" function in the regex library. I always use has_match() or is_match() and then matches() or all_matches() to process my regex queries. I'm generally more concerned with using patterns to extract one bit of text from a larger string, for which matches() works much better. YMMV, of course.

regex re_pattern = regex:new( `(.*?)` ) 
 
if regex:has_match( re_pattern, my_text ) then 
 
    sequence matches = regex:matches( re_pattern, my_text ) 
    -- play with matches ;) 
 
end if 

-Greg

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu