Question 1

After seeing this question and its duplicate a question still remained for me.

I get what is and == do and why if I run

a = "ab"
b = "ab"a == b

I get True. The question here would be WHY this happens:

a = "ab"
b = "ab"
a is b # Returns True

So I did my research and I found this. The answer says Python interpreter uses string pooling. So if it sees that two strings are the same, it assigns the same id to the new one for optimization.

Until here everything is alright and answered. My real question is why this pooling only happens for some strings. Here is an example:

a = "ab"
b = "ab"
a is b # Returns True, as expected knowing Interpreter uses string poolinga = "a_b"
b = "a_b"
a is b # Returns True, again, as expected knowing Interpreter uses string poolinga = "a b"
b = "a b"
a is b # Returns False, why??a = "a-b"
b = "a-b"
a is b # Returns False, WHY??

So it seems for some characters, string pooling isn't working. I used Python 2.7.6 for this examples so I thought this would be fixed in Python 3. But after trying the same examples in Python 3, the same results appear.

Question: Why isn't string pooling optimized for this examples? Wouldn't it be better for Python to optimize this as well?

Edit: If I run "a b" is "a b" returns True. The question is why using variables it returns False for some characters but True for others.

Question 2

Your question is a duplicate of a more general question "When does python choose to intern a string", the correct answer to which is that string interning is implementation specific.

Interning of strings in CPython 2.7.7 is described very well in this article: The internals of Python string interning. Information therein allows to explain your examples.

The reason that the strings "ab" and "a_b" are interned, whereas "a b" and "a-b" aren't, is that the former look like python identifiers and the latter don't.

Naturally, interning every single string would incur a runtime cost. Therefore the interpreter must decide whether a given string is worth interning. Since the names of identifiers used in a python program are embedded in the program's bytecode as strings, identifier-like strings have a higher chance of benefiting from interning.

A short excerpt from the above article:

The function all_name_chars rules out strings that are not composedof ascii letters, digits or underscores, i.e. strings looking likeidentifiers:
#define NAME_CHARS \"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */static int
all_name_chars(unsigned char *s)
{static char ok_name_char[256];static unsigned char *name_chars = (unsigned char *)NAME_CHARS;if (ok_name_char[*name_chars] == 0) {unsigned char *p;for (p = name_chars; *p; p++)ok_name_char[*p] = 1;}while (*s) {if (ok_name_char[*s++] == 0)return 0;}return 1;
}
With all these explanations in mind, we now understand why 'foo!' is'foo!' evaluates to False whereas 'foo' is 'foo' evaluates toTrue.

Python Interpreter String Pooling Optimization [duplicate]

Related Q&A

Flattening an array in pandas

Difficulty in using sympy solver in python

Add custom html between two model fields in Django admins change_form

Plotly: How to add a horizontal scrollbar to a plotly express figure?

How to run script in Pyspark and drop into IPython shell when done?

Finding Min/Max Date with List Comprehension in Python

plotting single 3D point on top of plot_surface in python matplotlib

python group/user management packages

Resize NumPy array to smaller size without copy

TensorFlow FileWriter not writing to file