Question 1

EDIT: I have just found a line in my code that changes my df from a RangeIndex to a numeric Int64Index. How and why does this happen?

Before this line all my df are type RangeIndex. After this line of code df_new changes to type Int64Index which is a Range Index instead of a Numeric Index.

# remove rows with DMT, no lumninance data
df_new = df_new[df_new.Person != 'DMT']

Can anyone explain the following?

Int64Index and RangeIndex

"Warning Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see here. Int64Index is a fundamental basic index in pandas. This is an Immutable array implementing an ordered, sliceable set. Prior to 0.18.0, the Int64Index would provide the default index for all NDFrame objects. RangeIndex is a sub-class of Int64Index added in version 0.18.0, now providing the default index for all NDFrame objects. RangeIndex is an optimized version of Int64Index that can represent a monotonic ordered set. These are analogous to Python range types." [from https://pandas.pydata.org/pandas-docs/stable/advanced.html#int64index-and-rangeindex]

What why does index type change from RangeIndex to Int64Index?
What are the key or important differences between working with the dataframes with the two different types of indexes? (RangeIndex & Int64Index)
type(df_val.index)
pandas.core.indexes.range.RangeIndex
type(df_new.index)
pandas.core.indexes.numeric.Int64Index

Question 2

As per the pandas documentation

RangeIndex is a memory-saving special case of Int64Index limited to representing monotonic ranges. Using RangeIndex may in some instances improve computing speed.

Parameters: start : int (default: 0), or other RangeIndex instance.

If int and “stop” is not given, interpreted as “stop” instead.

stop : int (default: 0)

Int64Index is a special case of Index with purely integer labels.

step : int (default: 1)

Parameters: data : array-like (1-dimensional)

Output of RangeIndex from my own code:

RangeIndex(start=0, stop=4622, step=1). In my program there are 4622 number of observation.

Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,

        ...934, 935, 936, 937, 938, 939, 940, 941, 942, 943],dtype='int64', name='user_id', length=943)

No. of observation: 943

python - Dataframes with RangeIndex vs.Int64Index - Why?

Related Q&A

Uniform Circular LBP face recognition implementation

SQLAlchemy declarative one-to-many not defined error

Convert numpy.array object to PIL image object

Scheduling celery tasks with large ETA

How to read out scroll wheel info from /dev/input/mice?

Tell me why this does not end up with a timeout error (selenium 2 webdriver)?

PEP 8: comparison to True should be if cond is True: or if cond:

Getting the title of youtube video in pytube3?

pandas - concat with columns of same categories turns to object

Python convert Excel File (xls or xlsx) to/from ODS