Python Interpreter String Pooling Optimization [duplicate]

2024/9/23 16:29:15

After seeing this question and its duplicate a question still remained for me.

I get what is and == do and why if I run

a = "ab"
b = "ab"a == b

I get True. The question here would be WHY this happens:

a = "ab"
b = "ab"
a is b # Returns True

So I did my research and I found this. The answer says Python interpreter uses string pooling. So if it sees that two strings are the same, it assigns the same id to the new one for optimization.

Until here everything is alright and answered. My real question is why this pooling only happens for some strings. Here is an example:

a = "ab"
b = "ab"
a is b # Returns True, as expected knowing Interpreter uses string poolinga = "a_b"
b = "a_b"
a is b # Returns True, again, as expected knowing Interpreter uses string poolinga = "a b"
b = "a b"
a is b # Returns False, why??a = "a-b"
b = "a-b"
a is b # Returns False, WHY??

So it seems for some characters, string pooling isn't working. I used Python 2.7.6 for this examples so I thought this would be fixed in Python 3. But after trying the same examples in Python 3, the same results appear.

Question: Why isn't string pooling optimized for this examples? Wouldn't it be better for Python to optimize this as well?


Edit: If I run "a b" is "a b" returns True. The question is why using variables it returns False for some characters but True for others.

Answer

Your question is a duplicate of a more general question "When does python choose to intern a string", the correct answer to which is that string interning is implementation specific.

Interning of strings in CPython 2.7.7 is described very well in this article: The internals of Python string interning. Information therein allows to explain your examples.

The reason that the strings "ab" and "a_b" are interned, whereas "a b" and "a-b" aren't, is that the former look like python identifiers and the latter don't.

Naturally, interning every single string would incur a runtime cost. Therefore the interpreter must decide whether a given string is worth interning. Since the names of identifiers used in a python program are embedded in the program's bytecode as strings, identifier-like strings have a higher chance of benefiting from interning.

A short excerpt from the above article:

The function all_name_chars rules out strings that are not composedof ascii letters, digits or underscores, i.e. strings looking likeidentifiers:

#define NAME_CHARS \"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */static int
all_name_chars(unsigned char *s)
{static char ok_name_char[256];static unsigned char *name_chars = (unsigned char *)NAME_CHARS;if (ok_name_char[*name_chars] == 0) {unsigned char *p;for (p = name_chars; *p; p++)ok_name_char[*p] = 1;}while (*s) {if (ok_name_char[*s++] == 0)return 0;}return 1;
}

With all these explanations in mind, we now understand why 'foo!' is'foo!' evaluates to False whereas 'foo' is 'foo' evaluates toTrue.

https://en.xdnf.cn/q/71807.html

Related Q&A

Flattening an array in pandas

One of the columns in DataFrame is an array. How do I flatten it? column1 column2 column3 var1 var11 [1, 2, 3, 4] var2 var22 [1, 2, 3, 4, -2, 12] var3 var33 [1, 2, 3, 4, 33, 544]Afte…

Difficulty in using sympy solver in python

Please run the following codefrom sympy.solvers import solvefrom sympy import Symbolx = Symbol(x)R2 = solve(-109*x**5/3870720+4157*x**4/1935360-3607*x**3/69120+23069*x**2/60480+5491*x/2520+38-67,x)prin…

Add custom html between two model fields in Django admins change_form

Lets say Ive two models:class Book(models.Model):name = models.CharField(max_length=50)library = models.ForeignKeyField(Library)class Library(models.Model):name = models.CharField(max_length=50) addr…

Plotly: How to add a horizontal scrollbar to a plotly express figure?

Im beginning to learn more about plotly and pandas and have a multivariate time series I wish to plot and interact with using plotly.express features. I also want my plot to a horizontal scrollbar so t…

How to run script in Pyspark and drop into IPython shell when done?

I want to run a spark script and drop into an IPython shell to interactively examine data. Running both:$ IPYTHON=1 pyspark --master local[2] myscript.pyand$ IPYTHON=1 spark-submit --master local[2] my…

Finding Min/Max Date with List Comprehension in Python

So I have this list:snapshots = [2014-04-05,2014-04-06,2014-04-07,2014-04-08,2014-04-09]I would like to find the earliest date using a list comprehension.Heres what I have now, earliest_date = snapshot…

plotting single 3D point on top of plot_surface in python matplotlib

I have some code to plot 3D surfaces in Python using matplotlib:import math import numpy as np import matplotlib.pyplot as plt from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis from mpl…

python group/user management packages

I was looking for python user/group management package.(Creation of user group and adding/removing members to that group) I found flask_dashed. https://github.com/jeanphix/Flask-Dashed/ It more or less…

Resize NumPy array to smaller size without copy

When I shrink a numpy array using the resize method (i.e. the array gets smaller due to the resize), is it guaranteed that no copy is made?Example:a = np.arange(10) # array([0, 1, 2, 3, 4, …

TensorFlow FileWriter not writing to file

I am training a simple TensorFlow model. The training aspect works fine, but no logs are being written to /tmp/tensorflow_logs and Im not sure why. Could anyone provide some insight? Thank you# import…