OpenEuphoria: Forum: Bug in std/base64.e

1. Bug in std/base64.e

Posted by rzuckerm 1 month ago
1758 views

I didn't see any way to log an issue in https://github.com/OpenEuphoria/euphoria, but there is a bug in std/base64.e. The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:

Make sure that there are no more than 2 pad characters at the end
Once pads are removed from the end, make sure that all characters are valid Base64

Also, the documentation does not mention that -1 is output if the input has any errors.

Here is what I had to add to my code to do the necessary error checking (I'm sure there's a better way to do this):

constant re_not_base64_chars = re:new("[^A-Za-z0-9+/]") 
 
-- Base64 decode 
-- 
-- Although the built-in Base64 decode is available, it does not do a very good 
-- job of error checking the input other that making sure that the input length 
-- is a multiple of 4. The output of this function is -1 if the input string 
-- is invalid, the Base64 decoded string otherwise. 
function base64_decode(sequence str) 
    -- Error if more than 2 pad trailing characters 
    atom str_length = length(str) 
    atom pad_length = str_length - length(trim_tail(str, "=")) 
    if pad_length > 2 
    then 
        return -1 
    end if 
 
    -- Remove trailing pad characters 
    str_length -= pad_length 
 
    -- Error if invalid Base64 characters 
    if re:has_match(re_not_base64_chars, str[1..str_length]) 
    then 
        return -1 
    end if 
 
    -- Use Base64 decode library function 
    return base64:decode(str) 
end function

I'm willing to do a PR to fix this, but I didn't see any contributing guide or any documentation on how to test changes.

new topic » topic index » view message » categorize

2. Re: Bug in std/base64.e

Posted by ghaberek (admin) 1 month ago
1731 views

rzuckerm said...

I didn't see any way to log an issue in https://github.com/OpenEuphoria/euphoria, but there is a bug in std/base64.e.

We track tickets here: https://openeuphoria.org/ticket/ I would like to move them to GitHub but to my knowledge there's no good way to do that and preserve the ticket numbers and original user information. I'm open to suggestions on that if anyone knows how.

rzuckerm said...

The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:

Make sure that there are no more than 2 pad characters at the end
Once pads are removed from the end, make sure that all characters are valid Base64

Also, the documentation does not mention that -1 is output if the input has any errors.

I'm not seeing how this is a bug? Not trying to be pedantic; just want to make sure you didn't leave out some unexpected behavior you're experiencing when using the library. I agree it needs improvement. Looking at the code I see several ways we could make this better.

rzuckerm said...

Here is what I had to add to my code to do the necessary error checking (I'm sure there's a better way to do this):

Calling trim_tail(str, '=') makes an unnecessary copy of the sequence that you're then discarding, just to get its length. Also I would use the types library instead of regex. I love regular expressions but they're a bit heavy for checks like this.

include std/base64.e 
include std/types.e 
 
-- Returns 1 if all characters are in the allow range(s) 
type t_base64(object test_data) 
    return types:char_test(test_data, { 
        {'0','9'}, -- [0-9] 
        {'A','Z'}, -- [A-Z] 
        {'a','z'}, -- [a-z] 
        {'+','+'}, -- [+] 
        {'/','/'}  -- [/] 
    }) 
end type 
 
-- Returns the number of times 'ch' appears at the end of 's' 
function count_at_end(sequence s, integer ch) 
    integer n = length(s) 
    while n != 0 and s[n] = ch do 
        n -= 1 
    end while 
    return length(s) - n 
end function 
 
-- Base64 decode  
-- ... 
function base64_decode(sequence str) 
    -- Error if more than 2 pad trailing characters 
    atom pad_length = count_at_end(str, '=') 
    if pad_length > 2 then 
        return -1 
    end if 
 
    -- Remove trailing pad characters 
    str = str[1..$-pad_length] 
 
    -- Error if invalid Base64 characters 
    if not t_base64(str) then 
        return -1 
    end if 
 
    -- Use Base64 decode library function 
    return base64:decode(str) 
end function

If you're using encode() as well, you might want to also check that it's only receiving valid bytes:

function base64_decode(sequence str) 
    -- Error of anything but valid bytes (0..255) 
    if not types:string(str) then 
        return -1 
    end if 
 
    -- Use Base64 encode library function 
    return base64:encode(str) 
end function

rzuckerm said...

I'm willing to do a PR to fix this, but I didn't see any contributing guide or any documentation on how to test changes.

I don't think we have an official contributing guide and anything found in the wiki is likely outdated and still references Mercurial and SCM. If you're interested in contributing (which is great! we very much need some help around here) I can try to put something together.

But basically, once you make your changes, add or update the relevant unit tests (in this case t_base64.e) and make sure they pass with eutest. You can also include the -all parameter to see the tests' status. This helps to ensure your new test(s) are actually running.

$ eutest tests/t_base64.e 
 
interpreting tests/t_base64.e: 
 
Test results summary: 
Files (run: 1) (failed: 0) (100% success)

-Greg

new topic » goto parent » topic index » view message » categorize

3. Re: Bug in std/base64.e

Posted by rzuckerm 1 month ago
1719 views

ghaberek said...

rzuckerm said...

The decode function does not do any error checking except for making sure that the length of the input is a multiple of 4. Here are some checks that I think is needs to have:

Make sure that there are no more than 2 pad characters at the end
Once pads are removed from the end, make sure that all characters are valid Base64

Also, the documentation does not mention that -1 is output if the input has any errors.

I'm not seeing how this is a bug? Not trying to be pedantic; just want to make sure you didn't leave out some unexpected behavior you're experiencing when using the library. I agree it needs improvement. Looking at the code I see several ways we could make this better.

Sorry, but I think this is a bug. The current library code will attempt to do a conversion on something that is not valid Base64. Here's a simplified example:

include std/base64.e 
 
sequence x = "YX!q" 
sequence y = base64:decode(x) 
printf(1, "'%s'\n", {y})

This produces 'ap*'. I don't think the code should be interpreting the ! character. Instead, it should be returning an error result (-1). Also, more than 2 pad characters at the end is not valid Base64. For an input string length modulo 3, the number of pad characters should be the following:

Mod 3 Value	Pad Characters
0	0
1	2
2	1

new topic » goto parent » topic index » view message » categorize

4. Re: Bug in std/base64.e

Posted by ghaberek (admin) 1 month ago
1670 views

rzuckerm said...

Sorry, but I think this is a bug. The current library code will attempt to do a conversion on something that is not valid Base64.

Fair enough. It should indeed not be doing that.

-Greg

new topic » goto parent » topic index » view message » categorize

5. Re: Bug in std/base64.e

Posted by rzuckerm 1 month ago
1662 views

Greg,

I'm trying to test this out in a dev container in VSCode. I put my code changes in, and modified my tests, but when I run eutest tests/t_base64.e, it doesn't seem to be picking up my code changes. I even tried putting in a deliberate bad statement to see if I'd at least get some type of error, but I'm not seeing any error. Is there something I need to do to force the code to recompile or to clear some cache?

I also need to mention that I am using WSL2 with a Ubuntu 22.04 installed. I am using Windows 11.

Thanks, Ron

new topic » goto parent » topic index » view message » categorize

6. Re: Bug in std/base64.e

Posted by rzuckerm 1 month ago
1647 views

Never mind. I have to run the tests with -i $(pwd)/include.

new topic » goto parent » topic index » view message » categorize

7. Re: Bug in std/base64.e

Posted by rzuckerm 1 month ago
1649 views

Pull request is ready for review: https://github.com/OpenEuphoria/euphoria/pull/16

new topic » goto parent » topic index » view message » categorize

8. Re: Bug in std/base64.e

Posted by ghaberek (admin) 1 month ago
1609 views

rzuckerm said...

I'm trying to test this out in a dev container in VSCode.

Full disclosure: I have not touched the vscode-euphoria extension in a while so I'm not sure what state it's in.

As I recall, any further improvements required writing a language server which hasn't been high on my priority list.

rzuckerm said...

I put my code changes in, and modified my tests, but when I run eutest tests/t_base64.e, it doesn't seem to be picking up my code changes. I even tried putting in a deliberate bad statement to see if I'd at least get some type of error, but I'm not seeing any error. Is there something I need to do to force the code to recompile or to clear some cache?

Never mind. I have to run the tests with -i $(pwd)/include.

Yep. By default your interpreter is going to be using the include files it's pointing to in eu.cfg where it's installed.

So if your interpreter is /usr/local/euphoria/bin/eui there should be eu.cfg that contains -i /usr/local/euphoria/include

When you build Euphoria, the configure script creates a new eu.cfg in the build directory that points to the include directory next to source.

So if you're building in /home/rzuckerm/euphoria/source/build, that eu.cfg should contain -i /home/rzuckerm/euphoria/include

rzuckerm said...

I also need to mention that I am using WSL2 with a Ubuntu 22.04 installed. I am using Windows 11.

Excellent! So am I. Actually it looks like I'm on Ubuntu 24.04. But I've been using WSL as my primary development environment for a while now. At least since WSL2 came out in 2019.

rzuckerm said...

Pull request is ready for review: https://github.com/OpenEuphoria/euphoria/pull/16

All of the checks passed, so that's good. Although currently that just means the builds completed. I don't have the actions running unit tests because many of the tests need correcting to pass. Yet another thing that needs some attention around here.

I have merged your PR. Thanks!

-Greg

new topic » goto parent » topic index » view message » categorize

9. Re: Bug in std/base64.e

Posted by rzuckerm 1 month ago
1589 views

Thanks, Greg, for your quick review. I have just one more issue dealing with the handling of the \0 character in decode. I suspect that it will crash due to an index of 0 on the decode table. I'll do a fix real quick and put up another PR: https://github.com/OpenEuphoria/euphoria/pull/17

new topic » goto parent » topic index » view message » categorize

10. Re: Bug in std/base64.e

Posted by ghaberek (admin) 1 month ago
1570 views

rzuckerm said...

Thanks, Greg, for your quick review. I have just one more issue dealing with the handling of the \0 character in decode. I suspect that it will crash due to an index of 0 on the decode table. I'll do a fix real quick and put up another PR: https://github.com/OpenEuphoria/euphoria/pull/17

Good catch! I've merged that PR now as well. Thanks again.

-Greg

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. Bug in std/base64.e

2. Re: Bug in std/base64.e

3. Re: Bug in std/base64.e

4. Re: Bug in std/base64.e

5. Re: Bug in std/base64.e

6. Re: Bug in std/base64.e

7. Re: Bug in std/base64.e

8. Re: Bug in std/base64.e

9. Re: Bug in std/base64.e

10. Re: Bug in std/base64.e

Search

Include:

Quick Links

User menu

Misc Menu