1. Adding/Removing bytes from the beginning of a file
- Posted by GreenEuphorian Feb 21, 2015
- 1875 views
- Last edited Feb 22, 2015
How to add/remove bytes from the beginning of a (binary) file?
Thanks
Green Euphorian
2. Re: Adding/Removing byes from the beginning of a file
- Posted by BRyan Feb 21, 2015
- 1834 views
How to add/remove bytes from the beginning of a (binary) file?
Thanks
Green Euphorian
How about reading the whole file into a sequence.
Chop the front off the sequence and write the sequence back into the file.
3. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 21, 2015
- 1824 views
How to add/remove bytes from the beginning of a (binary) file?
Thanks
Green Euphorian
How about reading the whole file into a sequence.
Chop the front off the sequence and write the sequence back into the file.
If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?
4. Re: Adding/Removing byes from the beginning of a file
- Posted by ryanj Feb 21, 2015
- 1830 views
How to add/remove bytes from the beginning of a (binary) file?
Thanks
Green Euphorian
How about reading the whole file into a sequence.
Chop the front off the sequence and write the sequence back into the file.
If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?
I haven't tried this, but you could probably use file_num = open("my_file", "u") to open text file for update (reading and writing), then copy data within the file a section at a time to limit memory useage, using seek, getc, putc, or other functions in http://openeuphoria.org/docs/std_io.html. I'm not sure how to reduce the size of the file, but i believe it grows if you write past the end of the file.
5. Re: Adding/Removing byes from the beginning of a file
- Posted by ryanj Feb 21, 2015
- 1823 views
It might be better to seek to the desired starting position, then copy the rest of the file (a character or section at a time) to a new file, then delete the old file and rename the new one to the old name.
6. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 21, 2015
- 1820 views
I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?
7. Re: Adding/Removing byes from the beginning of a file
- Posted by ryanj Feb 21, 2015
- 1858 views
I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?
I see a SetEndOfFile function on MSDN, but i can't find anything for setting the beginning of file.
8. Re: Adding/Removing byes from the beginning of a file
- Posted by ryanj Feb 21, 2015
- 1813 views
I think large binary files are typically managed by creating a sort of file system inside the file. Part of the file has an allocation table or bitmap that maps out the data inside the file. Parts of the file would contain valid data and parts would be "empty". To delete data from anywhere in the file, mark that area as "empty". A cleanup or compact function would be called periodically to rebuild the file by defragmenting valid areas and removing excess empty areas, probably by copying all the data to a fresh new file and building a new allocation table.
9. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 21, 2015
- 1786 views
I think large binary files are typically managed by creating a sort of file system inside the file. Part of the file has an allocation table or bitmap that maps out the data inside the file. Parts of the file would contain valid data and parts would be "empty". To delete data from anywhere in the file, mark that area as "empty". A cleanup or compact function would be called periodically to rebuild the file by defragmenting valid areas and removing excess empty areas, probably by copying all the data to a fresh new file and building a new allocation table.
This is way too complicated and useless for what I need. In fact, I simply need to remove the "magic number" (file signature bytes) from certain files.
10. Re: Adding/Removing byes from the beginning of a file
- Posted by petelomax Feb 21, 2015
- 1852 views
How about reading the whole file into a sequence. Chop the front off the sequence and write the sequence back into the file.
If there is another way, I would prefer it, because the file might be huge (even hundreds of MBs, or even GBs, so loading the whole thing into memory would not be a good idea. Any other suggestions please?
I would have said much the same, but in blocks of anywhere between 8 and 256K. Actually, Eu hides much of this from you, so getc/putc is not as bad as it could be, and in fact quite reasonable performance-wise.
I am venturing here something that might be impractical or even impossible: what about gaining low-level access to the file allocation table and modifying it in such a way that the beginning point of a file is moved a few bytes after its current position? This way, no re-writing of the file would be needed. Does this make sense?
I see a SetEndOfFile function on MSDN, but i can't find anything for setting the beginning of file.
An equivalent imaginary "SetStartOfFile" function (from MSDN) would only work (without moving 100s GB) when the part you wanted to remove happened to be a whole file sector, or similar, so basically, no.
The suggestion I would make is rewrite the first few bytes with a special sequence meaning "ignore N bytes at start of file". Obviously that depends on who/what has to read it and whether you can modify that to obey said special sequence, however I would hazard that is a non-starter as if it were that simple you'd just modify the exact same things to skip whatever it is you want them to skip. Sorry.
Pete
11. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 21, 2015
- 1791 views
Thanks for the clarifications.
I'll try to follow Pete's advice about simply changing the first few bytes, without trimming them off. Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.
What is the relevant command that I would use to overwrite the first few bytes?
Thanks again
12. Re: Adding/Removing byes from the beginning of a file
- Posted by DerekParnell (admin) Feb 21, 2015
- 1770 views
How to add/remove bytes from the beginning of a (binary) file?
You have to create a new file based on copying the original file, adding bytes or not copying bytes as required.
Sorry but there is no easy way out of this. Doing low-level filesystem manipulation is bound to be messy and dangerous; definitely not worth the effort.
If running on Windows, you could open the file in update mode, copying bytes from later locations to earlier locations (means remembering and setting current file position) and then when you have finished that, calling the API routine SetEndOfFile() in the kernel.dll.
13. Re: Adding/Removing byes from the beginning of a file
- Posted by DerekParnell (admin) Feb 21, 2015
- 1790 views
Thanks for the clarifications.
I'll try to follow Pete's advice about simply changing the first few bytes, without trimming them off. Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.
What is the relevant command that I would use to overwrite the first few bytes?
Thanks again
This idea will work if its your own file layout design. If it is a standard or proprietary file type, you may run into problems doing this.
14. Re: Adding/Removing byes from the beginning of a file
- Posted by petelomax Feb 21, 2015
- 1783 views
What is the relevant command that I would use to overwrite the first few bytes?
Erm, fairly straightforward I should think:
fn = open(filename,"ub") if seek(fn,0)!=SEEK_OK then ?9/0 end if -- (not strictly neccessary; and SEEK_OK is automatically defined as 0 in Phix) puts(fn,some_bytes) close(fn)
Obviously, caveat emptor, untested, expect problems if you clobber more bytes than you should, as Derek said it may make the file unusable by other apps, etc.
Pete
PS: Another (random) thought occurs to me that if you don't have enough bytes (ignore this if you do), and you really don't want to move 100s of GB, a "get first 7 bytes from file 5" type scheme might help.
PPS: It also might be sensible to blat the first few bytes, do your thing, then restore those overwritten bytes.
15. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 22, 2015
- 1761 views
Thanks a lot. Only, I don't understand at all the line containing SEEK_OK. What is that about?!? Could you please explain it?
Thanks again
16. Re: Adding/Removing byes from the beginning of a file
- Posted by Ekhnat0n Feb 22, 2015
- 1739 views
Simply declare a struct called New_File consisting of 2 structures eg
File_Data and New_Data.
Then write your new data to the struct New_Data and the original file to File_Data.
Now write the structure New_File to disk and that's all.
Not that hard to come up with and really a piece of cake.
17. Re: Adding/Removing byes from the beginning of a file
- Posted by petelomax Feb 22, 2015
- 1716 views
Thanks a lot. Only, I don't understand at all the line containing SEEK_OK. What is that about?!? Could you please explain it?
When you open a file (except in "a" mode) the file pointer should already be at the start of file, so you can omit the whole line if you want, and I probably should have done too. Or you might prefer
if seek(fn,0)!=0 then crash("couldn't seek to start of file!!") end if -- (or log the error and return failure)
seek() is a bit unusual in that it returns 0 on success. In Phix there is a predefined constant SEEK_OK with the value 0, in OpenEuphoria you can just use 0. ?9/0 is just my way of getting a crash when something goes horribly wrong.
I kind of assumed you would need to read a few bytes to see if there was something you needed to skip, after which you would need a seek.
HTH, Pete
18. Re: Adding/Removing byes from the beginning of a file
- Posted by GreenEuphorian Feb 22, 2015
- 1710 views
Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.
What about this?
19. Re: Adding/Removing byes from the beginning of a file
- Posted by Spock Feb 22, 2015
- 1718 views
Would overwriting the first few bytes entail reading and re-writing the rest of the file too? Or can the modified bytes be saved on the disk just by themselves? You see, I am only worried about the performance overhead in case the whole file had to be re-written.
What about this?
Do you only want to truncate some leading bytes in a file (& modify a few more)? It might help to know what your use case is. Anyway, for performance, I would not alter the file length at all but simply write the desired start pointer to another small file and use that to seek() whenever you need to read the large file.
Also, modifying the other bytes is not difficult and doesn't require reading the whole file.
EDIT: However, prepending bytes could pose a performance issue. What is your use case?
Spock
20. Re: Adding/Removing bytes from the beginning of a file
- Posted by GreenEuphorian Feb 22, 2015
- 1702 views
Well, the initial idea when I started this thread was prepending bytes and removing bytes from a file, but from the answers I got I realised that this is not convenient at all, and may also lead to troubles.
So my current task is: overwriting the first few bytes of a file (containing the file signature, a.k.a. magic number, or simply header) and then, later on, overwriting them again, back to their original state. I already got the info about doing this, thanks to Pete.
Now, the only question unanswered is: will this simple byte-overwriting process (without any truncation or prepending) entail also a complete re-writing of the whole file on the disk when the file is updated? Or will those few bytes alone be modified on the disk (which indeed is my desired outcome)? There are significant performance issues involved, especially in the case of huge files. That's why I am asking.
Thanks again to all.
21. Re: Adding/Removing bytes from the beginning of a file
- Posted by Spock Feb 22, 2015
- 1695 views
- Last edited Feb 23, 2015
will this simple byte-overwriting process (without any truncation or prepending) entail also a complete re-writing of the whole file on the disk when the file is updated? Or will those few bytes alone be modified on the disk (which indeed is my desired outcome)?
A combination of seek() and puts() will allow you to overwrite specific bytes without affecting the rest of the file.
Spock
[Edit: typo]
22. Re: Adding/Removing bytes from the beginning of a file
- Posted by GreenEuphorian Feb 22, 2015
- 1660 views
A combination of seek() and putc() will allow you to overwrite specific bytes without affecting the rest of the file.
putc() ?!? I could not find it in the manual. Or do you mean puts()?
23. Re: Adding/Removing bytes from the beginning of a file
- Posted by Spock Feb 23, 2015
- 1649 views
A combination of seek() and putc() will allow you to overwrite specific bytes without affecting the rest of the file.
putc() ?!? I could not find it in the manual. Or do you mean puts()?
Yeah, puts(). There's also put_integer16() and put_integer32() which do seem unneccesarily long...
Spock
24. Re: Adding/Removing bytes from the beginning of a file
- Posted by SDPringle Feb 25, 2015
- 1594 views
True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.
25. Re: Adding/Removing bytes from the beginning of a file
- Posted by Spock Feb 25, 2015
- 1601 views
True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.
This is exactly what I do (rewrite functions with the correct name). Incidentally, the main function in my own RegEx lib (written from scratch but loosely based on a User Contribution) is named match instead of find. It makes sense given that we are more concerned with matching a pattern rather than finding its location.
Spock
26. Re: Adding/Removing bytes from the beginning of a file
- Posted by ghaberek (admin) Feb 25, 2015
- 1588 views
True, they could have been called put2 and put4 respectively. It is hard to get people to change the names for things though, even before release I lobbied to name the regex functions match rather than find, but nobody would go along with that. Write your own library to wrap the function names that are too long.
This is exactly what I do (rewrite functions with the correct name). Incidentally, the main function in my own RegEx lib (written from scratch but loosely based on a User Contribution) is named match instead of find. It makes sense given that we are more concerned with matching a pattern rather than finding its location.
Spock
It's interesting that you bring this up because I only recently discovered that there even was a "find" function in the regex library. I always use has_match() or is_match() and then matches() or all_matches() to process my regex queries. I'm generally more concerned with using patterns to extract one bit of text from a larger string, for which matches() works much better. YMMV, of course.
regex re_pattern = regex:new( `(.*?)` ) if regex:has_match( re_pattern, my_text ) then sequence matches = regex:matches( re_pattern, my_text ) -- play with matches ;) end if
-Greg