Slice endpoints invisibly truncated

2024/7/27 16:11:28
>>> class Potato(object):
...    def __getslice__(self, start, stop):
...       print start, stop
...         
>>> sys.maxint
9223372036854775807
>>> x = sys.maxint + 69
>>> print x
9223372036854775876
>>> Potato()[123:x]
123 9223372036854775807

Why the call to getslice doesn't respect the stop I sent in, instead silently substituting 2^63 - 1? Does it mean that implementing __getslice__ for your own syntax will generally be unsafe with longs?

I can do whatever I need with __getitem__ anyway, I'm just wondering why __getslice__ is apparently broken.

Edit: Where is the code in CPython which truncates the slice? Is this part of python (language) spec or just a "feature" of cpython (implementation)?

Answer

The Python C code that handles slicing for objects that implement the sq_slice slot, cannot handle any integers over Py_ssize_t (== sys.maxsize). The sq_slice slot is the C-API equivalent of the __getslice__ special method.

For a two-element slice, Python 2 uses one of the SLICE+* opcodes; this is then handled by the apply_slice() function. This uses the _PyEval_SliceIndex function to convert the Python index objects (int, long, or anything implementing the __index__ method) to a Py_ssize_t integer. The method has the following comment:

/* Extract a slice index from a PyInt or PyLong or an object with thenb_index slot defined, and store in *pi.Silently reduce values larger than PY_SSIZE_T_MAX to PY_SSIZE_T_MAX,and silently boost values less than -PY_SSIZE_T_MAX-1 to -PY_SSIZE_T_MAX-1.Return 0 on error, 1 on success.
*/

This means that any slicing in Python 2 using the 2-value syntax is limited to values in the sys.maxsize range when a sq_slice slot is provided.

Slicing using the three-value form (item[start:stop:stride]) uses the BUILD_SLICE opcode instead (followed by BINARY_SUBSCR) and this instead creates a slice() object without limiting to sys.maxsize.

If the object doesn't implement a sq_slice() slot (so no __getslice__ is present) the apply_slice() function also falls back to using a slice() object.

As for this being an implementation detail or part of the language: the Slicings expression documentation distinguishes between simple_slicing and extended_slicing; the former only permits the short_slice form. For simple slicing the indices must be plain integers:

The lower and upper bound expressions, if present, must evaluate to plain integers; defaults are zero and the sys.maxint, respectively.

This suggests that Python 2 the language limits the indices to sys.maxint values, disallowing long integers. In Python 3 simple slicing has been excised from the language altogether.

If your code has to support slicing with values beyond sys.maxsize and you have to inherit from a type that implements __getslice__ then your options are to:

  • use the three-value syntax, with None for the stride:

    Potato()[123:x:None]
    
  • to create slice() objects explicitly:

    Potato()[slice(123, x)]
    

slice() objects can handle long integers just fine; however the slice.indices() method cannot handle lengths over sys.maxsize still:

>>> import sys
>>> s = slice(0, sys.maxsize + 1)
>>> s
slice(0, 9223372036854775808L, None)
>>> s.stop
9223372036854775808L
>>> s.indices(sys.maxsize + 2)
Traceback (most recent call last):File "<stdin>", line 1, in <module>
OverflowError: cannot fit 'long' into an index-sized integer
https://en.xdnf.cn/q/73066.html

Related Q&A

Selenium Webdriver with Java vs. Python

Im wondering what the pros and cons are of using Selenium Webdriver with the python bindings versus Java. So far, it seems like going the java route has much better documentation. Other than that, it s…

asyncio - how many coroutines?

I have been struggling for a few days now with a python application where I am expecting to look for a file or files in a folder and iterate through the each file and each record in it and create objec…

Calculating a 3D gradient with unevenly spaced points

I currently have a volume spanned by a few million every unevenly spaced particles and each particle has an attribute (potential, for those who are curious) that I want to calculate the local force (ac…

deleting every nth element from a list in python 2.7

I have been given a task to create a code for. The task is as follows:You are the captain of a sailing vessel and you and your crew havebeen captured by pirates. The pirate captain has all of you stand…

Bradley-Roth Adaptive Thresholding Algorithm - How do I get better performance?

I have the following code for image thresholding, using the Bradley-Roth image thresholding method. from PIL import Image import copy import time def bradley_threshold(image, threshold=75, windowsize=5…

How to display all images in a directory with flask [duplicate]

This question already has answers here:Reference template variable within Jinja expression(1 answer)Link to Flask static files with url_for(2 answers)Closed 6 years ago.I am trying to display all image…

Reindex sublevel of pandas dataframe multiindex

I have a time series dataframe and I would like to reindex it by Trials and Measurements.Simplified, I have this:value Trial 1 0 131 32 42 3 NaN4 123…

How to publish to an Azure Devops PyPI feed with Poetry?

I am trying to set up Azure Devops to publish to a PyPI feed with Poetry. I know about Twine authentication and storing credentials to an Azure Key Vault. But is there any more straightforward method?…

Python Regex Match Before Character AND Ignore White Space

Im trying to write a regex to match part of a string that comes before / but also ignores any leading or trailing white space within the match.So far Ive got ^[^\/]* which matches everything before the…

Python Twisted integration with Cmd module

I like Pythons Twisted and Cmd. I want to use them together.I got some things working, but so far I havent figured out how to make tab-completion work, because I dont see how to receive tab keypres ev…