Re: Bug in std/base64.e
- Posted by ghaberek (admin) 1 month ago
- 1728 views
I didn't see any way to log an issue in https://github.com/OpenEuphoria/euphoria, but there is a bug in std/base64.e.
We track tickets here: https://openeuphoria.org/ticket/ I would like to move them to GitHub but to my knowledge there's no good way to do that and preserve the ticket numbers and original user information. I'm open to suggestions on that if anyone knows how.
The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:
- Make sure that there are no more than 2 pad characters at the end
- Once pads are removed from the end, make sure that all characters are valid Base64
Also, the documentation does not mention that -1 is output if the input has any errors.
I'm not seeing how this is a bug? Not trying to be pedantic; just want to make sure you didn't leave out some unexpected behavior you're experiencing when using the library. I agree it needs improvement. Looking at the code I see several ways we could make this better.
Here is what I had to add to my code to do the necessary error checking (I'm sure there's a better way to do this):
Calling trim_tail(str, '=') makes an unnecessary copy of the sequence that you're then discarding, just to get its length. Also I would use the types library instead of regex. I love regular expressions but they're a bit heavy for checks like this.
include std/base64.e include std/types.e -- Returns 1 if all characters are in the allow range(s) type t_base64(object test_data) return types:char_test(test_data, { {'0','9'}, -- [0-9] {'A','Z'}, -- [A-Z] {'a','z'}, -- [a-z] {'+','+'}, -- [+] {'/','/'} -- [/] }) end type -- Returns the number of times 'ch' appears at the end of 's' function count_at_end(sequence s, integer ch) integer n = length(s) while n != 0 and s[n] = ch do n -= 1 end while return length(s) - n end function -- Base64 decode -- ... function base64_decode(sequence str) -- Error if more than 2 pad trailing characters atom pad_length = count_at_end(str, '=') if pad_length > 2 then return -1 end if -- Remove trailing pad characters str = str[1..$-pad_length] -- Error if invalid Base64 characters if not t_base64(str) then return -1 end if -- Use Base64 decode library function return base64:decode(str) end function
If you're using encode() as well, you might want to also check that it's only receiving valid bytes:
function base64_decode(sequence str) -- Error of anything but valid bytes (0..255) if not types:string(str) then return -1 end if -- Use Base64 encode library function return base64:encode(str) end function
I'm willing to do a PR to fix this, but I didn't see any contributing guide or any documentation on how to test changes.
I don't think we have an official contributing guide and anything found in the wiki is likely outdated and still references Mercurial and SCM. If you're interested in contributing (which is great! we very much need some help around here) I can try to put something together.
But basically, once you make your changes, add or update the relevant unit tests (in this case t_base64.e) and make sure they pass with eutest. You can also include the -all parameter to see the tests' status. This helps to ensure your new test(s) are actually running.
$ eutest tests/t_base64.e interpreting tests/t_base64.e: Test results summary: Files (run: 1) (failed: 0) (100% success)
-Greg