I am trying to erase password string from memory like it is suggested in here.
I wrote that little snippet:
import ctypes, sysdef zerome(string):location = id(string) + 20size = sys.getsizeof(string) - 20#memset = ctypes.cdll.msvcrt.memset# For Linux, use the following. Change the 6 to whatever it is on your computer.print ctypes.string_at(location, size)memset = ctypes.CDLL("libc.so.6").memsetmemset(location, 0, size)print "Clearing 0x%08x size %i bytes" % (location, size)print ctypes.string_at(location, size)a = "asdasd"zerome(a)
Oddly enouth this code works fine with IPython,
[7] oz123@yenitiny:~ $ ipython a.py
Clearing 0x02275b84 size 23 bytes
But crashes with Python:
[8] oz123@yenitiny:~ $ python a.py
Segmentation fault
[9] oz123@yenitiny:~ $
Any ideas why?
I tested on Debian Wheezy, with Python 2.7.3.
little update ...
The code works on CentOS 6.2 with Python 2.6.6.The code crashed on Debian with Python 2.6.8.I tried thinking why it works on CentOS, and not on Debian. The only reason, which came an immidiate different, is that my Debian is multiarch and CentOSis running on my older laptop with i686 CPU.
Hence, I rebooted my CentOS latop and loaded Debian Wheezy on it. The code works on Debian Wheezy which is not multi-arch. Hence, I suspect my configuration on Debian is somewhat problematic ...
ctypes has a memset
function already, so you don't have to make a function pointer for the libc/msvcrt function. Also, 20 bytes is for common 32-bit platforms. On 64-bit systems it's probably 36 bytes. Here's the layout of a PyStringObject
:
typedef struct {Py_ssize_t ob_refcnt; // 4|8 bytesstruct _typeobject *ob_type; // 4|8 bytesPy_ssize_t ob_size; // 4|8 byteslong ob_shash; // 4|8 bytes (4 on 64-bit Windows)int ob_sstate; // 4 byteschar ob_sval[1];
} PyStringObject;
So it could be 5*4 = 20 bytes on a 32-bit system, 8*4 + 4 = 36 bytes on 64-bit Linux, or 8*3 + 4*2 = 32 bytes on 64-bit Windows. Since a string isn't tracked with a garbage collection header, you can use sys.getsizeof
. In general if you don't want the GC header size included (in memory it's actually before the object's base address you get from id
), then use the object's __sizeof__
method. At least that's a general rule in my experience.
What you want is to simply subtract the buffer size from the object size. The string in CPython is null-terminated, so simply add 1 to its length to get the buffer size. For example:
>>> a = 'abcdef'
>>> bufsize = len(a) + 1
>>> offset = sys.getsizeof(a) - bufsize
>>> ctypes.memset(id(a) + offset, 0, bufsize)
3074822964L
>>> a
'\x00\x00\x00\x00\x00\x00'
Edit
A better alternative is to define the PyStringObject
structure. This makes it convenient to check ob_sstate
. If it's greater than 0, that means the string is interned and the sane thing to do is raise an exception. Single-character strings are interned, along with string constants in code objects that consist of only ASCII letters and underscore, and also strings used internally by the interpreter for names (variable names, attributes).
from ctypes import *class PyStringObject(Structure):_fields_ = [('ob_refcnt', c_ssize_t),('ob_type', py_object),('ob_size', c_ssize_t),('ob_shash', c_long),('ob_sstate', c_int),# ob_sval varies in size# zero with memset is simpler]def zerostr(s):"""zero a non-interned string"""if not isinstance(s, str):raise TypeError("expected str object, not %s" % type(s).__name__)s_obj = PyStringObject.from_address(id(s))if s_obj.ob_sstate > 0:raise RuntimeError("cannot zero interned string")s_obj.ob_shash = -1 # not hashed yetoffset = sizeof(PyStringObject)memset(id(s) + offset, 0, len(s))
For example:
>>> s = 'abcd' # interned by code object
>>> zerostr(s)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<string>", line 10, in zerostr
RuntimeError: cannot zero interned string>>> s = raw_input() # not interned
abcd
>>> zerostr(s)
>>> s
'\x00\x00\x00\x00'