Question 1

Short story:

Is Python 3 unicode string lookup O(1) or O(n)?

Long story:

Index lookup of a character in a C char array is constant time O(1) because we can with certainty jump to a contiguous memory location:

const char* mystring = "abcdef";
char its_d = mystring[3];

Its the same as saying:

char its_d = *(mystring + 3);

Because we know that sizeof(char) is 1 as C99, and because of ASCII one character fits in one byte.

Now, in Python 3, now that string literals are unicode strings, we have the following:

>>> mystring = 'ab€cd'
>>> len(mystring)
5
>>> mybytes = mystring.encode('utf-8')
>>> len(mybytes)
7
>>> mybytes
b'ab\xe2\x82\xaccd'
>>> mystring[2]
'€'
>>> mybytes[2]
226
>> ord(mystring[2])
8364

Being UTF-8 encoded, byte 2 is > 127 and thus uses a multibyte representation for the character 3.

I cannot other than conclude that a index lookup in a Python string CANNOT be O(1), because of the multibyte representation of characters? That means that mystring[2] is O(n), and that somehow a on-the-fly interpretation of the memory array is being performed ir order to find the character at index? If that's the case, did I missed some relevant documentation stating this?

I made some very basic benchmark but I cannot infer an O(n) behaviour: https://gist.github.com/carlos-jenkins/e3084a07402ccc25dfd0038c9fe284b5

$ python3 lookups.py
Allocating memory...
Go!
String lookup: 0.513942 ms
Bytes lookup : 0.486462 ms

EDIT: Updated with better example.

Question 2

UTF-8 is the default source encoding for Python. The internal representation uses fixed-size per-character elements in both Python 2 and Python 3. One of the results is that accessing characters in Python (Unicode) string objects by index has O(1) cost.

The code and results you presented do not demonstrate otherwise. You convert a string to a UTF-8-encoded byte sequence, and we all know that UTF-8 uses variable-length code sequences, but none of that says anything about the internal representation of the original string.

Python 3 string index lookup is O(1)?

Related Q&A

Using PIL to detect a scan of a blank page

Pandas: Filling data for missing dates

Linear Regression: How to find the distance between the points and the prediction line?

How to draw a Tetrahedron mesh by matplotlib?

How to set seaborn jointplot axis to log scale

Convert decision tree directly to png [duplicate]

Python: can I modify a Tuple?

Saving scatterplot animations

Pandas: Bin dates into 30 minute intervals and calculate averages

Regular expression for UK Mobile Number - Python