1. Bug in std/base64.e
- Posted by rzuckerm 4 weeks ago
- 1719 views
I didn't see any way to log an issue in https://github.com/OpenEuphoria/euphoria, but there is a bug in std/base64.e. The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:
- Make sure that there are no more than 2 pad characters at the end
- Once pads are removed from the end, make sure that all characters are valid Base64
Also, the documentation does not mention that -1 is output if the input has any errors.
Here is what I had to add to my code to do the necessary error checking (I'm sure there's a better way to do this):
constant re_not_base64_chars = re:new("[^A-Za-z0-9+/]") -- Base64 decode -- -- Although the built-in Base64 decode is available, it does not do a very good -- job of error checking the input other that making sure that the input length -- is a multiple of 4. The output of this function is -1 if the input string -- is invalid, the Base64 decoded string otherwise. function base64_decode(sequence str) -- Error if more than 2 pad trailing characters atom str_length = length(str) atom pad_length = str_length - length(trim_tail(str, "=")) if pad_length > 2 then return -1 end if -- Remove trailing pad characters str_length -= pad_length -- Error if invalid Base64 characters if re:has_match(re_not_base64_chars, str[1..str_length]) then return -1 end if -- Use Base64 decode library function return base64:decode(str) end function
I'm willing to do a PR to fix this, but I didn't see any contributing guide or any documentation on how to test changes.
2. Re: Bug in std/base64.e
- Posted by ghaberek (admin) 3 weeks ago
- 1692 views
I didn't see any way to log an issue in https://github.com/OpenEuphoria/euphoria, but there is a bug in std/base64.e.
We track tickets here: https://openeuphoria.org/ticket/ I would like to move them to GitHub but to my knowledge there's no good way to do that and preserve the ticket numbers and original user information. I'm open to suggestions on that if anyone knows how.
The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:
- Make sure that there are no more than 2 pad characters at the end
- Once pads are removed from the end, make sure that all characters are valid Base64
Also, the documentation does not mention that -1 is output if the input has any errors.
I'm not seeing how this is a bug? Not trying to be pedantic; just want to make sure you didn't leave out some unexpected behavior you're experiencing when using the library. I agree it needs improvement. Looking at the code I see several ways we could make this better.
Here is what I had to add to my code to do the necessary error checking (I'm sure there's a better way to do this):
Calling trim_tail(str, '=') makes an unnecessary copy of the sequence that you're then discarding, just to get its length. Also I would use the types library instead of regex. I love regular expressions but they're a bit heavy for checks like this.
include std/base64.e include std/types.e -- Returns 1 if all characters are in the allow range(s) type t_base64(object test_data) return types:char_test(test_data, { {'0','9'}, -- [0-9] {'A','Z'}, -- [A-Z] {'a','z'}, -- [a-z] {'+','+'}, -- [+] {'/','/'} -- [/] }) end type -- Returns the number of times 'ch' appears at the end of 's' function count_at_end(sequence s, integer ch) integer n = length(s) while n != 0 and s[n] = ch do n -= 1 end while return length(s) - n end function -- Base64 decode -- ... function base64_decode(sequence str) -- Error if more than 2 pad trailing characters atom pad_length = count_at_end(str, '=') if pad_length > 2 then return -1 end if -- Remove trailing pad characters str = str[1..$-pad_length] -- Error if invalid Base64 characters if not t_base64(str) then return -1 end if -- Use Base64 decode library function return base64:decode(str) end function
If you're using encode() as well, you might want to also check that it's only receiving valid bytes:
function base64_decode(sequence str) -- Error of anything but valid bytes (0..255) if not types:string(str) then return -1 end if -- Use Base64 encode library function return base64:encode(str) end function
I'm willing to do a PR to fix this, but I didn't see any contributing guide or any documentation on how to test changes.
I don't think we have an official contributing guide and anything found in the wiki is likely outdated and still references Mercurial and SCM. If you're interested in contributing (which is great! we very much need some help around here) I can try to put something together.
But basically, once you make your changes, add or update the relevant unit tests (in this case t_base64.e) and make sure they pass with eutest. You can also include the -all parameter to see the tests' status. This helps to ensure your new test(s) are actually running.
$ eutest tests/t_base64.e interpreting tests/t_base64.e: Test results summary: Files (run: 1) (failed: 0) (100% success)
-Greg
3. Re: Bug in std/base64.e
- Posted by rzuckerm 3 weeks ago
- 1678 views
The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:
- Make sure that there are no more than 2 pad characters at the end
- Once pads are removed from the end, make sure that all characters are valid Base64
Also, the documentation does not mention that -1 is output if the input has any errors.
I'm not seeing how this is a bug? Not trying to be pedantic; just want to make sure you didn't leave out some unexpected behavior you're experiencing when using the library. I agree it needs improvement. Looking at the code I see several ways we could make this better.
Sorry, but I think this is a bug. The current library code will attempt to do a conversion on something that is not valid Base64. Here's a simplified example:
include std/base64.e sequence x = "YX!q" sequence y = base64:decode(x) printf(1, "'%s'\n", {y})
This produces 'ap*'. I don't think the code should be interpreting the ! character. Instead, it should be returning an error result (-1). Also, more than 2 pad characters at the end is not valid Base64. For an input string length modulo 3, the number of pad characters should be the following:
Mod 3 Value | Pad Characters |
---|---|
0 | 0 |
1 | 2 |
2 | 1 |
4. Re: Bug in std/base64.e
- Posted by ghaberek (admin) 3 weeks ago
- 1631 views
Sorry, but I think this is a bug. The current library code will attempt to do a conversion on something that is not valid Base64.
Fair enough. It should indeed not be doing that.
-Greg
5. Re: Bug in std/base64.e
- Posted by rzuckerm 3 weeks ago
- 1622 views
Greg,
I'm trying to test this out in a dev container in VSCode. I put my code changes in, and modified my tests, but when I run eutest tests/t_base64.e, it doesn't seem to be picking up my code changes. I even tried putting in a deliberate bad statement to see if I'd at least get some type of error, but I'm not seeing any error. Is there something I need to do to force the code to recompile or to clear some cache?
I also need to mention that I am using WSL2 with a Ubuntu 22.04 installed. I am using Windows 11.
Thanks, Ron
6. Re: Bug in std/base64.e
- Posted by rzuckerm 3 weeks ago
- 1609 views
Never mind. I have to run the tests with -i $(pwd)/include.
7. Re: Bug in std/base64.e
- Posted by rzuckerm 3 weeks ago
- 1610 views
Pull request is ready for review: https://github.com/OpenEuphoria/euphoria/pull/16
8. Re: Bug in std/base64.e
- Posted by ghaberek (admin) 3 weeks ago
- 1570 views
I'm trying to test this out in a dev container in VSCode.
Full disclosure: I have not touched the vscode-euphoria extension in a while so I'm not sure what state it's in.
As I recall, any further improvements required writing a language server which hasn't been high on my priority list.
I put my code changes in, and modified my tests, but when I run eutest tests/t_base64.e, it doesn't seem to be picking up my code changes. I even tried putting in a deliberate bad statement to see if I'd at least get some type of error, but I'm not seeing any error. Is there something I need to do to force the code to recompile or to clear some cache?
Never mind. I have to run the tests with -i $(pwd)/include.
Yep. By default your interpreter is going to be using the include files it's pointing to in eu.cfg where it's installed.
So if your interpreter is /usr/local/euphoria/bin/eui there should be eu.cfg that contains -i /usr/local/euphoria/include
When you build Euphoria, the configure script creates a new eu.cfg in the build directory that points to the include directory next to source.
So if you're building in /home/rzuckerm/euphoria/source/build, that eu.cfg should contain -i /home/rzuckerm/euphoria/include
I also need to mention that I am using WSL2 with a Ubuntu 22.04 installed. I am using Windows 11.
Excellent! So am I. Actually it looks like I'm on Ubuntu 24.04. But I've been using WSL as my primary development environment for a while now. At least since WSL2 came out in 2019.
Pull request is ready for review: https://github.com/OpenEuphoria/euphoria/pull/16
All of the checks passed, so that's good. Although currently that just means the builds completed. I don't have the actions running unit tests because many of the tests need correcting to pass. Yet another thing that needs some attention around here.
I have merged your PR. Thanks!
-Greg
9. Re: Bug in std/base64.e
- Posted by rzuckerm 3 weeks ago
- 1551 views
Thanks, Greg, for your quick review. I have just one more issue dealing with the handling of the \0 character in decode. I suspect that it will crash due to an index of 0 on the decode table. I'll do a fix real quick and put up another PR: https://github.com/OpenEuphoria/euphoria/pull/17
10. Re: Bug in std/base64.e
- Posted by ghaberek (admin) 3 weeks ago
- 1531 views
Thanks, Greg, for your quick review. I have just one more issue dealing with the handling of the \0 character in decode. I suspect that it will crash due to an index of 0 on the decode table. I'll do a fix real quick and put up another PR: https://github.com/OpenEuphoria/euphoria/pull/17
Good catch! I've merged that PR now as well. Thanks again.
-Greg