1. Checksums, hashes, and digests, oh my!
- Posted by ghaberek (admin) Jan 18, 2022
- 829 views
I'm currently working on updating the internal hashing function to support most (hopefully all, eventually) secure hash algorithms like MD5, SHA1, SHA256, etc. Here are the issues I'm facing that I'm looking for feedback on:
Issue 1
Issue 2
Issue 3
In summary:
- Should hash() be renamed to checksum()?
- Should hash() be a built-in or machine func?
- Should we have algorithm functions like sha256sum()?
- Should digest algorithms only operate on sequences of bytes?
- Should we use picohash, libtomcrypt, or something else?
- What digest algorithms should we bake into Euphoria?
I believe Derek wrote the original code for this so if he's around I'd really appreciate his insight.
-Greg
2. Re: Checksums, hashes, and digests, oh my!
- Posted by ChrisB (moderator) Jan 18, 2022
- 798 views
Hi Gregg
What are these for? How will they benefit an average Euser? Will a name change break a lot of stuff?
Starting questions.
Cheers
Chris
3. Re: Checksums, hashes, and digests, oh my!
- Posted by ghaberek (admin) Jan 18, 2022
- 844 views
What are these for?
Cryptographic hash functions are used to verify the integrity or content of messages and files. They're typically used to store a one-way hash for validating passwords, or to check the integrity of a downloaded file.
How will they benefit an average Euser?
This is one of those features that nearly anyone can and should be using when needed. Providing a standard, base-line implementation ensures most users don't have to hunt for code and re-invent the wheel.
Will a name change break a lot of stuff?
I don't think so. We can accommodate the change in the standard library so std/map.e, etc. aren't affected and then mark hash() as deprecated in std/hash.e and then we can remove it entirely in a later version.
So std/hash.e would end up looking something like this:
constant M_CHECKSUM = 98, M_CALCHASH = 99 public enum ADLER32, FLETCHER32, ... deprecate -- this will issue a warning if still used public function hash( object data_in, integer algo ) return machine_func( M_CHECKSUM, {data_in,algo} ) end function public function checksum( object data_in, integer algo ) return machine_func( M_CHECKSUM, {data_in,algo} ) end function public function adler32( object data_in ) return machine_func( M_CHECKSUM, {data_in,ADLER32} ) end function public function fletcher32( object data_in ) return machine_func( M_CHECKSUM, {data_in,FLETCHER32} ) end function -- etc. public enum MD5, SHA1, SHA256, ... public function calc_hash( sequence data_in, integer algo ) return machine_func( M_CALCHASH, {data_in,algo} ) end function public function md5sum( sequence data_in ) return machine_func( M_CALCHASH, {data_in,MD5} ) end function public function sha1sum( sequence data_in ) return machine_func( M_CALCHASH, {data_in,SHA1} ) end function -- etc.
There are plenty more algorithms that we could implement, but I think when we go from "common message digests" to "actual data cryptography" we should focus on providing an external library.
-Greg
4. Re: Checksums, hashes, and digests, oh my!
- Posted by ChrisB (moderator) Jan 18, 2022
- 774 views
Hi Gregg
TBH, you should probably go for one that is best for you / quick and dirty to implement, and then if anyone wants or needs a more specialist one, or if you want to spend more time on a superior one, then do it as a secondary project. In my very humle opinion, I'm not sure this will add a lot to the eu ecosystem at this point. Maybe even leave hooks to alternative methods, much like the database library, that is somewhere.
Cheers
Chris
5. Re: Checksums, hashes, and digests, oh my!
- Posted by petelomax Jan 18, 2022
- 789 views
Just so you know,
Phix begrudgingly(/not an exact match for OE) supports hash(x,HSIEH30), in builtins\hash.e, for reasons lost in the mists of time, and no other values for algo.
(There is also a half-baked and probably quite long dead builtins\phash.e...)
Otherwise it has (not as autoincludes)
builtins\sha256.e (and [uniquely in this list] a separate hand-crafted pwa\builtins\sha256.js)
builtins\sha512.e
builtins\sha1.e
builtins\hmac.e
builtins\md5.e
builtins\ripemd160.e
along with a few other crc32|md4|md5 bits 'n pieces in demo\rosetta.
I would say it feels wrong to me to bundle all such hash algorithms in a single file, as most apps likely only need one, and while the compiler can help a bit, it must be simpler to swap out one for another, without breaking some other unrelated program, ditto ship sources, when they are all in separate files. To my mind a builtins\hash.e might|should perhaps contain some common helper routines, but nothing else.