After seeing this question and its duplicate a question still remained for me.
I get what is and == do and why if I run
a = "ab"
b = "ab"a == b
I get True. The question here would be WHY this happens:
a = "ab"
b = "ab"
a is b # Returns True
So I did my research and I found this. The answer says Python interpreter uses string pooling. So if it sees that two strings are the same, it assigns the same id to the new one for optimization.
Until here everything is alright and answered. My real question is why this pooling only happens for some strings. Here is an example:
a = "ab"
b = "ab"
a is b # Returns True, as expected knowing Interpreter uses string poolinga = "a_b"
b = "a_b"
a is b # Returns True, again, as expected knowing Interpreter uses string poolinga = "a b"
b = "a b"
a is b # Returns False, why??a = "a-b"
b = "a-b"
a is b # Returns False, WHY??
So it seems for some characters, string pooling isn't working. I used Python 2.7.6 for this examples so I thought this would be fixed in Python 3. But after trying the same examples in Python 3, the same results appear.
Question: Why isn't string pooling optimized for this examples? Wouldn't it be better for Python to optimize this as well?
Edit: If I run "a b" is "a b" returns True. The question is why using variables it returns False for some characters but True for others.
Answer
Your question is a duplicate of a more general question "When does python choose to intern a string", the correct answer to which is that string interning is implementation specific.
Interning of strings in CPython 2.7.7 is described very well in this article: The internals of Python string interning. Information therein allows to explain your examples.
The reason that the strings "ab" and "a_b" are interned, whereas "a b" and "a-b" aren't, is that the former look like python identifiers and the latter don't.
Naturally, interning every single string would incur a runtime cost. Therefore the interpreter must decide whether a given string is worth interning. Since the names of identifiers used in a python program are embedded in the program's bytecode as strings, identifier-like strings have a higher chance of benefiting from interning.
A short excerpt from the above article:
The function all_name_chars rules out strings that are not composedof ascii letters, digits or underscores, i.e. strings looking likeidentifiers:
#define NAME_CHARS \"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */static int
all_name_chars(unsigned char *s)
{static char ok_name_char[256];static unsigned char *name_chars = (unsigned char *)NAME_CHARS;if (ok_name_char[*name_chars] == 0) {unsigned char *p;for (p = name_chars; *p; p++)ok_name_char[*p] = 1;}while (*s) {if (ok_name_char[*s++] == 0)return 0;}return 1;
}
With all these explanations in mind, we now understand why 'foo!' is'foo!' evaluates to False whereas 'foo' is 'foo' evaluates toTrue.
One of the columns in DataFrame is an array. How do I flatten it? column1 column2 column3
var1 var11 [1, 2, 3, 4]
var2 var22 [1, 2, 3, 4, -2, 12]
var3 var33 [1, 2, 3, 4, 33, 544]Afte…
Please run the following codefrom sympy.solvers import solvefrom sympy import Symbolx = Symbol(x)R2 = solve(-109*x**5/3870720+4157*x**4/1935360-3607*x**3/69120+23069*x**2/60480+5491*x/2520+38-67,x)prin…
Im beginning to learn more about plotly and pandas and have a multivariate time series I wish to plot and interact with using plotly.express features. I also want my plot to a horizontal scrollbar so t…
I want to run a spark script and drop into an IPython shell to interactively examine data. Running both:$ IPYTHON=1 pyspark --master local[2] myscript.pyand$ IPYTHON=1 spark-submit --master local[2] my…
So I have this list:snapshots = [2014-04-05,2014-04-06,2014-04-07,2014-04-08,2014-04-09]I would like to find the earliest date using a list comprehension.Heres what I have now, earliest_date = snapshot…
I have some code to plot 3D surfaces in Python using matplotlib:import math import numpy as np
import matplotlib.pyplot as plt
from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis
from mpl…
I was looking for python user/group management package.(Creation of user group and adding/removing members to that group) I found flask_dashed. https://github.com/jeanphix/Flask-Dashed/ It more or less…
When I shrink a numpy array using the resize method (i.e. the array gets smaller due to the resize), is it guaranteed that no copy is made?Example:a = np.arange(10) # array([0, 1, 2, 3, 4, …
I am training a simple TensorFlow model. The training aspect works fine, but no logs are being written to /tmp/tensorflow_logs and Im not sure why. Could anyone provide some insight? Thank you# import…