Flattening an array in pandas

2024/9/23 16:24:01

One of the columns in DataFrame is an array. How do I flatten it?

column1 column2 column3
var1     var11   [1, 2, 3, 4]
var2     var22   [1, 2, 3, 4, -2, 12]
var3     var33   [1, 2, 3, 4, 33, 544]

After flattening it should be:

column1 column2 column3
var1     var11   1
var1     var11   2
var1     var11   3
var1     var11   4
var2     var22   1
var2     var22   2
var2     var22   3
var2     var22   4
var2     var22   -2
......
var3     var33   544

I seemed unstack could help me but I couldn't understand how exactly.

Answer

Here is one 'one-liner' approach, where df is your dataframe:

import pandas as pddf.join(df.column3.apply(pd.Series)).drop('column3', 1).set_index([u'column1', u'column2']).stack().reset_index().drop('level_2', 1).rename(columns={0:'column3'})

yielding:

   column1 column2  column3
0     var1   var11        1
1     var1   var11        2
2     var1   var11        3
3     var1   var11        4
4     var2   var22        1
5     var2   var22        2
6     var2   var22        3
7     var2   var22        4
8     var2   var22       -2
9     var2   var22       12
10    var3   var33        1
11    var3   var33        2
12    var3   var33        3
13    var3   var33        4
14    var3   var33       33
15    var3   var33      544
https://en.xdnf.cn/q/71806.html

Related Q&A

Difficulty in using sympy solver in python

Please run the following codefrom sympy.solvers import solvefrom sympy import Symbolx = Symbol(x)R2 = solve(-109*x**5/3870720+4157*x**4/1935360-3607*x**3/69120+23069*x**2/60480+5491*x/2520+38-67,x)prin…

Add custom html between two model fields in Django admins change_form

Lets say Ive two models:class Book(models.Model):name = models.CharField(max_length=50)library = models.ForeignKeyField(Library)class Library(models.Model):name = models.CharField(max_length=50) addr…

Plotly: How to add a horizontal scrollbar to a plotly express figure?

Im beginning to learn more about plotly and pandas and have a multivariate time series I wish to plot and interact with using plotly.express features. I also want my plot to a horizontal scrollbar so t…

How to run script in Pyspark and drop into IPython shell when done?

I want to run a spark script and drop into an IPython shell to interactively examine data. Running both:$ IPYTHON=1 pyspark --master local[2] myscript.pyand$ IPYTHON=1 spark-submit --master local[2] my…

Finding Min/Max Date with List Comprehension in Python

So I have this list:snapshots = [2014-04-05,2014-04-06,2014-04-07,2014-04-08,2014-04-09]I would like to find the earliest date using a list comprehension.Heres what I have now, earliest_date = snapshot…

plotting single 3D point on top of plot_surface in python matplotlib

I have some code to plot 3D surfaces in Python using matplotlib:import math import numpy as np import matplotlib.pyplot as plt from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis from mpl…

python group/user management packages

I was looking for python user/group management package.(Creation of user group and adding/removing members to that group) I found flask_dashed. https://github.com/jeanphix/Flask-Dashed/ It more or less…

Resize NumPy array to smaller size without copy

When I shrink a numpy array using the resize method (i.e. the array gets smaller due to the resize), is it guaranteed that no copy is made?Example:a = np.arange(10) # array([0, 1, 2, 3, 4, …

TensorFlow FileWriter not writing to file

I am training a simple TensorFlow model. The training aspect works fine, but no logs are being written to /tmp/tensorflow_logs and Im not sure why. Could anyone provide some insight? Thank you# import…

python time.strftime %z is always zero instead of timezone offset

>>> import time >>> t=1440935442 >>> time.strftime("%Y/%m/%d-%H:%M:%S %z",time.gmtime(t)) 2015/08/30-11:50:42 +0000 >>> time.strftime("%Y/%m/%d-%H:%M:…