Question 1

It is well known, that small bytes-objects are automatically "interned" by CPython (similar to the intern-function for strings). Correction: As explained by @abarnert it is more like the integer-pool than the interned strings.

Is it possible to restore the interned bytes-objects after they have been corrupted by let's say an "experimental" third party library or is the only way to restart the kernel?

The proof of concept can be done with Cython-functionality (Cython>=0.28):

%%cython
def do_bad_things():cdef bytes b=b'a'cdef const unsigned char[:] safe=b  cdef char *unsafe=<char *> &safe[0]   #who needs const and type-safety anyway?unsafe[0]=98                          #replace through `b`

or as suggested by @jfs through ctypes:

import ctypes
import sys
def do_bad_things():b = b'a'; (ctypes.c_ubyte * sys.getsizeof(b)).from_address(id(b))[-2] = 98

Obviously, by misusing C-functionality, do_bad_things changes immutable (or so the CPython thinks) object b'a' to b'b' and because this bytes-object is interned, we can see bad things happen afterwards:

>>> do_bad_things() #b'a' means now b'b'
>>> b'a'==b'b'  #wait for a surprise  
True
>>> print(b'a') #another one
b'b'

It is possible to restore/clear the byte-object-pool, so that b'a' means b'a' once again?

A little side note: It seems as if not every bytes-creation process is using this pool. For example:

>>> do_bad_things()
>>> print(b'a')
b'b'
>>> print((97).to_bytes(1, byteorder='little')) #ord('a')=97
b'a'

Question 2

Python 3 doesn't intern bytes objects the way it does str. Instead, it keeps a static array of them the way it does with int.

This is very different under the covers. On the down side, it means there's no table (with an API) to be manipulated. On the up side, it means that if you can find the static array, you can fix it, the same way you would for ints, because the array index and the character value of the string are supposed to be identical.

If you look in bytesobject.c, the array is declared at the top:

static PyBytesObject *characters[UCHAR_MAX + 1];

… and then, for example, within PyBytes_FromStringAndSize:

if (size == 1 && str != NULL &&(op = characters[*str & UCHAR_MAX]) != NULL)
{
#ifdef COUNT_ALLOCSone_strings++;
#endifPy_INCREF(op);return (PyObject *)op;
}

Notice that the array is static, so it's not accessible from outside this file, and that it's still refcounting the objects, so callers (even internal stuff in the interpreter, much less your C API extension) can't tell that there's anything special going on.

So, there's no "correct" way to clean this up.

But if you want to get hacky…

If you have a reference to any of the single-char bytes, and you know which character it was supposed to be, you can get to the start of the array and then clean up the whole thing.

Unless you've screwed up even more than you think, you can just construct a one-char bytes and subtract the character it was supposed to be. PyBytes_FromStringAndSize("a", 1) is going to return the object that's supposed to be 'a', even if it happens to actually hold 'b'. How do we know that? Because that's exactly the problem that you're trying to fix.

Actually, there are probably ways you could break things even worse… which all seem very unlikely, but to be safe, let's use a character you're less likely to have broken than a, like \x80:

PyBytesObject *byte80 = (PyBytesObject *)PyBytes_FromStringAndSize("\x80", 1);
PyBytesObject *characters = byte80 - 0x80;

The only other caveat is that if you try to do this from Python with ctypes instead of from C code, it would require some extra care,¹ but since you're not using ctypes, let's not worry about that.

So, now we have a pointer to characters, we can walk it. We can't just delete the objects to "unintern" them, because that will hose anyone who has a reference to any of them, and probably lead to a segfault. But we don't have to. Any object that's in the table, we know what it's supposed to be—characters[i] is supposed to be a one-char bytes whose one character is i. So just set it back to that, with a loop something like this:

for (size_t char i=0; i!=UCHAR_MAX; i++) {if (characters[i]) {// do the same hacky stuff you did to break the string in the first place}
}

That's all there is to it.

Well, except for compilation.²

Fortunately, at the interactive interpreter, each complete top-level statement is its own compilation unit, so… you should be OK with any new line you type after running the fix.

But a module you've imported, that had to be compiled, while you had the broken strings? You've probably screwed up its constants. And I can't think of a good way to clean this up except to forcibly recompile and reimport every module.

_{1. The compiler might turn your b'\x80' argument into the wrong thing before it even gets to the C call. And you'd be surprised at all the places you think you're passing around a c_char_p and it's actually getting magically converted to and from bytes. Probably better to use a POINTER(c_uint8).}

_{2. If you compiled some code with b'a' in it, the consts array should have a reference to b'a', which will get fixed. But, since bytes are known immutable to the compiler, if it knows that b'a' == b'b', it may actually store the pointer to the b'b' singleton instead, for the same reason that 123456 is 123456 is true, in which case fixing b'a' may not actually solve the problem.}

Is it possible to restore corrupted “interned” bytes-objects

Related Q&A

Wildcard namespaces in lxml

WordNet - What does n and the number represent?

How to change the values of a column based on two conditions in Python

logging module for python reports incorrect timezone under cygwin

Set ordering of Apps and models in Django admin dashboard

python database / sql programming - where to start

How to install Python 3.5 on Raspbian Jessie

Django - last insert id

How to check if default value for python function argument is set using inspect?

OpenCV-Python cv2.CV_CAP_PROP_POS_FRAMES error