I'm not a python programmer, but I'm trying to translate some Python code to R. The piece of python code I'm having trouble with is:
hashlib.sha256(x).hexdigest()
My interpretation of this code is that the function is going to calculate the hash of x using the sha256 algorithm and return the value in hex.
Given that interpretation, I am using the following R function:
digest(x, algo="sha256", raw=FALSE)
Based upon my albeit limited knowledge of R and what I have read online on Python's hashlib function the two functions should be producing identical results, but they are not.
Am I missing something or am I using the wrong R function.
Yes, both the Python and the R sample code returns a hexadecimal representation of a SHA256 hash digest for the data passed in.
You do need to switch off serialisation in R, otherwise you the digest()
package first creates a serialisation of the string rather than calculate the hash for the character data only; set serialize
to FALSE
:
> digest('', algo="sha256", serialize=FALSE)
[1] "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
> digest('hello world', algo="sha256", serialize=FALSE)
[1] "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"
These match their Python equivalents:
>>> import hashlib
>>> hashlib.sha256('').hexdigest()
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
>>> hashlib.sha256('hello world').hexdigest()
'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
If your hashes then still differ between R and Python, then your data is different. That could be a subtle as a newline at the end of the line, or a byte order mark at the start.
In Python, inspect the output of print(repr(x))
to represent the data as a Python string literal; this shows non-printable characters as escape sequences. I'm sure R has similar debugging tools. Both R and Python echo string values as representations when using their interactive modes.