How to create a DataFrame while preserving order of the columns?

2024/11/20 19:28:33

How can I create a DataFrame from multiple numpy arrays, Pandas Series, or Pandas DataFrame's while preserving the order of the columns?

For example, I have these two numpy arrays and I want to combine them as a Pandas DataFrame.

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )

If I do this, the bar column would come first because dict doesn't preserve order.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )bar foo
0   4   1
1   5   2
2   6   3

I can do this, but it gets tedious when I need to combine many variables.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )

EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

Answer

Original Solution: Incorrect Usage of collections.OrderedDict

In my original solution, I proposed to use OrderedDict from the collections package in python's standard library.

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )foo  bar
0    1    4
1    2    5
2    3    6

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )a  c  b
0  1  7  4
1  2  8  5
2  3  9  6>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )a  b  c
0  1  4  7
1  2  5  8
2  3  6  9
https://en.xdnf.cn/q/26288.html

Related Q&A

Dynamically limiting queryset of related field

Using Django REST Framework, I want to limit which values can be used in a related field in a creation. For example consider this example (based on the filtering example on https://web.archive.org/web/…

How to clear GPU memory after PyTorch model training without restarting kernel

I am training PyTorch deep learning models on a Jupyter-Lab notebook, using CUDA on a Tesla K80 GPU to train. While doing training iterations, the 12 GB of GPU memory are used. I finish training by sav…

cryptography is required for sha256_password or caching_sha2_password

Good day. Hope your all are well. Can someone help me with fix this? Im new to the MySQL environment. Im trying to connect to MySQL Database remotely. I used the following python code and got this err…

Django: How to access original (unmodified) instance in post_save signal

I want to do a data denormalization for better performance, and put a sum of votes my blog post receives inside Post model:class Post(models.Model):""" Blog entry """autho…

timeit and its default_timer completely disagree

I benchmarked these two functions (they unzip pairs back into source lists, came from here): n = 10**7 a = list(range(n)) b = list(range(n)) pairs = list(zip(a, b))def f1(a, b, pairs):a[:], b[:] = zip(…

How to pickle and unpickle to portable string in Python 3

I need to pickle a Python3 object to a string which I want to unpickle from an environmental variable in a Travis CI build. The problem is that I cant seem to find a way to pickle to a portable string …

Using RabbitMQ is there a way to look at the queue contents without a dequeue operation?

As a way to learn RabbitMQ and python Im working on a project that allows me to distribute h264 encodes between a number of computers. The basics are done, I have a daemon that runs on Linux or Mac th…

Pip default behavior conflicts with virtualenv?

I was following this tutorial When I got to virtualenv flask command, I received this error message: Can not perform a --user install. User site-packages are not visible in this virtualenv.This makes s…

How to update user password in Django Rest Framework?

I want to ask that following code provides updating password but I want to update password after current password confirmation process. So what should I add for it? Thank you.class UserPasswordSeriali…

ImportError: No module named xlrd

I am currently using PyCharm with Python version 3.4.3 for this particular project.This PyCharm previously had Python2.7, and I upgraded to 3.4.3.I am trying to fetch data from an Excel file using Pand…