I have this DataFrame where the columns are coordinates (e.g. x1,y1,x2,y2...). The coordinate columns start from the 8th column (the previous ones are irrelevant for the question)
I have a larger example sample here, but here's a sample:
start_column = 8
df = pd.DataFrame(columns = ['x1','y1','x2','y2'],data = [(0,0,1,0),(0,1,2,3),(-1,-2,None,None)])
for i in range(7):df.insert(0,'c'+str(7-i),'x')
df
I want to create a new column in the DataFrame as a list of xy pairs, as in: df["coordinates"]=[[x1,y1],[x2,y2],[x3,y3]....]
What I've tried so far:
for row in df.iterrows():for i in range(1,total_count_of_xy_rows):df["coordinates"]= df[["x{}".format(i),"y{}".format(i)]].values.tolist()print(df)
Is there a better way to do this?
You can create the new column by .apply
-ing a custom list comprehension function across the different rows:
start_column = 8
coordinates_list = list(zip(df.columns[(start_column-1):-1:2],df.columns[start_column::2]))
df['coordinates'] = df.apply(lambda row: [(row[x], row[y]) for x,y in coordinates_list if not any((pd.isna(row[x]), pd.isna(row[y])))], axis=1)
Using this example input, with the coordinate columns starting from the 8th column, as you stated in a comment:
df = pd.DataFrame(columns = ['x1','y1','x2','y2'],data = [(0,0,1,0),(0,1,2,3),(-1,-2,None,None)])
for i in range(start_column-1):df.insert(0,'c'+str(start_column-1-i),'x')
dfc1 c2 c3 c4 c5 c6 c7 x1 y1 x2 y2
0 x x x x x x x 0 0 1.0 0.0
1 x x x x x x x 0 1 2.0 3.0
2 x x x x x x x -1 -2 NaN NaN
This will produce this output:
c1 c2 c3 c4 c5 c6 c7 x1 y1 x2 y2 coordinates
0 x x x x x x x 0 0 1.0 0.0 [(0, 0), (1.0, 0.0)]
1 x x x x x x x 0 1 2.0 3.0 [(0, 1), (2.0, 3.0)]
2 x x x x x x x -1 -2 NaN NaN [(-1, -2)]
This deals with the unequal number of coordinates in each row. Hope that helps!