Data corruption: Wheres the bug‽

2024/10/11 14:20:49

Last edit: I've figured out what the problem was (see my own answer below) but I cannot mark the question as answered, it would seem. If someone can answer the questions I have in my answer below, namely, is this a bug in Cython or is this Cython's intended behavior, I will mark that answer as accepted, because that would be the most useful lesson to gain from this, IMHO.


Firstly, I have to start by saying that I have been trying to figure this out for three days, and I am just banging my head against the wall. As best as I can tell from the documentation, I am doing things correctly. Obviously, I can't be doing things correctly, though, because if I were, I wouldn't have a problem (right?).

In any event, I am working on a binding for mcrypt to Python. It should work with both Python 2 and Python 3 (though it's untested for Python 2). It's available on my site, linked because it is way too large to include in the post, and given that I don't know what I am doing wrong, I cannot even isolate what might be the problem code. The script that shows the problem is also on my site. The script just feeds 100 blocks of nothing but the letter "a" (in whatever block size the encryption algorithm/encryption mode uses), and of course should get a block of "a" as the result of roundtripping. But it does not (always). Here is output from a single run of it:

Wed Dec 15 10:35:44 EST 2010
test.py:5: McryptSecurityWarning: get_key() is not recommendedreturn ''.join(['{:02x}'.format(x) for x in o.get_key()])key: b'\x01ez\xd5\xa9\xf9\x1f)\xa0G\xd2\xf2Z\xfc{\x7fn\x02?,\x08\x1c\xc8\x03\x061X\xb5\xc9\x99\xd0\xca'
key: b'\x01ez\xd5\xa9\xf9\x1f)\xa0G\xd2\xf2Z\xfc{\x7fn\x02?,\x08\x1c\xc8\x03\x061X\xb5\xc9\x99\xd0\xca'
16
self test result: 0
enc parameters: {'salt': '6162636465666768', 'mode': 'cbc', 'algorithm': 'rijndael-128', 'iv': '61626364616263646162636461626364'}
dec parameters: {'salt': '6162636465666768', 'mode': 'cbc', 'algorithm': 'rijndael-128', 'iv': '61626364616263646162636461626364'}
enc key: 01657ad5a9f91f29a047d2f25afc7b7f6e023f2c081cc803063158b5c999d0ca
dec key: 01657ad5a9f91f29a047d2f25afc7b7f6e023f2c081cc803063158b5c999d0ca
Stats: 88 / 100 good packets (88.0%)#5: b'aaaaaaaaaaaaaaaa' != b'\xa6\xb8\xf9\td\x8db\xf6\x00Y"ST\xc6\x9b\xe7'
#6: b'aaaaaaaaaaaaaaaa' != b'aaaaaaa1\xb3@\x8d\xff\xf9\xafpy'
#13: b'aaaaaaaaaaaaaaaa' != b'\xb9\xc8\xaf\x1f\xb8\x8c\x0b_\x15s\x9d\xecN,*w'
#14: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaa\xeb?\x13'
#49: b'aaaaaaaaaaaaaaaa' != b'_C\xf2\x15\xd5k\xe1XKIF5k\x82\xa4\xec'
#50: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaa+\xdf>\x01\xee'
#74: b'aaaaaaaaaaaaaaaa' != b'\x1c\xdf0\x05\xc7\x0b\xe9\x93H\xc5B\xd7\xcfj+\x03'
#75: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaw+\xed\x0f'
#79: b'aaaaaaaaaaaaaaaa' != b"\xf2\x89\x1ct\xe1\xeeBWo\xb4-\xb9\x085'\xef"
#80: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaa\xcc\x01n\xf0<'
#91: b'aaaaaaaaaaaaaaaa' != b'g\x02\x08\xbf\xa5\xd7\x90\xc1\x84D\xf3\x9d$a)\x06'
#92: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaaaa\x01'

The weird part is that it is exactly the same for a given (algorithm, mode) pair. I can change the algorithm and it will result in different round-trips, but always the same for every run when I don't change the algorithm. I'm absolutely stumped. Also, it's always two blocks in a row that are corrupt as you can see in the output above: blocks 5 and 6, 13 and 14, etc. So, there is a pattern but I am, for whatever reason, unable to figure out what that pattern is pointing to precisely.

I realize that I am probably asking a lot here: I can't isolate a small snip of code, and familiarity with both mcrypt and Python is probably required. Alas, after three days of hitting my head on this, I need to step away from the problem for a little bit, so I am posting this here in the hopes that maybe while I am taking a break from this problem either (a) someone will see where I introduced a bug, (b) I will be able to see my bug when I get back to the problem later, or (c) someone or myself can find the problem which maybe isn't a bug in my code but a bug in the binding process or the library itself.

One thing I haven't done is attempted to use another version of the mcrypt library. I'm doing my work with Cython 0.13, Python 3.1, and mcrypt 2.5.8, all as distributed by Ubuntu in Ubuntu 10.10 (except Cython, which I got from PyPi). But I manage systems with PHP applications that are functioning just fine and using mcrypt on Ubuntu 10.10 without data corruption, so I have no reason to believe that it is the build of mcrypt, so that just leaves… well, something wrong on my part somewhere, I think.

In any case, I thank anyone profusely who can help. I'm starting to feel like I'm going crazy because I've been working on this problem pretty much non-stop for days and I get the feeling that the solution is probably right in front of me, but I cannot see it.

Edit: Someone pointed out that I should be using memcpy instead of strncpy. I did that, but now, the test script shows that every block is incorrect. Color me even more confused than previously... here's the new output on pastebin.

Edit 2: I have come back to the computer and have been looking at it again, and I'm just adding print statements everywhere to find where things could be going wrong. The following code in the raw_encrypt.step(input) function:

    cdef char* buffer = <char*>malloc(in_len)print in_bin[:in_len]memcpy(buffer, <const_void *>in_bin, in_len)print "Before/after encryption"print buffer[:in_len]success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)print buffer[:in_len]

The first print statement shows the expected thing, the plaintext that is passed in. However, the second one shows something completely different, which it should be identical. It seems that there is something going on with Cython that I don't completely understand.

Answer

Oy, I hate to do this (answer my own question), but I found the answer: It is a quirk of Cython which I am going to have to look into (I don't know if it is an intended quirk, or if it is a bug).

The problem comes with the memcpy line. I cast the second parameter to <const_void*>, which matches the Cython definition in the pxd file, but apparently that makes Cython compile the code differently than using <char*>, the latter forcing Cython to pass a pointer to the actual bytes instead of (I guess?) a pointer to the Python object/variable itself.

So, instead of this:

cdef char* buffer = <char*>malloc(in_len)
memcpy(buffer, <const_void *>in_bin, in_len)
success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)

It needs to be this:

cdef char* buffer = <char*>malloc(in_len)
memcpy(buffer, <char *>in_bin, in_len)
success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)

What a strange quirk. I would honestly expect any cast to point to the same location, but it seems that the cast can affect behavior as well.

https://en.xdnf.cn/q/69764.html

Related Q&A

Python NetworkX — set node color automatically based on a list of values

I generated a graph with networkx import networkx as nx s = 5 G = nx.grid_graph(dim=[s,s]) nodes = list(G.nodes) edges = list(G.edges) p = [] for i in range(0, s):for j in range(0, s):p.append([i,j])…

control wspace for matplotlib subplots

I was wondering: I have a 1 row, 4 column plot. However, the first three subplots share the same yaxes extent (i.e. they have the same range and represent the same thing). The forth does not. What I w…

Getting indices of both zero and nonzero elements in array

I need to find the indicies of both the zero and nonzero elements of an array.Put another way, I want to find the complementary indices from numpy.nonzero().The way that I know to do this is as follows…

tweepy how to get a username from id

how do I derrive a plaintext username from a user Id number with tweepy? Here is the CORRECTED code that I am using:ids = [] userid = "someOne" for page in tweepy.Cursor(api.followers_ids, s…

How to select many to one to many without hundreds of queries using Django ORM?

My database has the following schema:class Product(models.Model):passclass Tag(models.Model):product = models.ForeignKey(Product)attr1 = models.CharField()attr2 = models.CharField()attr3 = models.CharF…

Quickly dumping a database in memory to file

I want to take advantage of the speed benefits of holding an SQLite database (via SQLAlchemy) in memory while I go through a one-time process of inserting content, and then dump it to file, stored to b…

QStatusBar message disappears on menu hover

I have a very basic QMainWindow application that contains a menubar and a statusbar. When I hover over the menu the status message disappears. More precisely, the status message is cleared. I have no i…

How to eliminate a python3 deprecation warning for the equality operator?

Although the title can be interpreted as three questions, the actual problem is simple to describe. On a Linux system I have python 2.7.3 installed, and want to be warned about python 3 incompatibiliti…

Cannot get scikit-learn installed on OS X

I cannot install scikit-learn. I can install other packages either by building them from source or through pip without a problem. For scikit-learn, Ive tried cloning the project on GitHub and installin…

Decompressing a .bz2 file in Python

So, this is a seemingly simple question, but Im apparently very very dull. I have a little script that downloads all the .bz2 files from a webpage, but for some reason the decompressing of that file is…