Pandas data frame: convert Int column into binary in python

2024/7/6 21:48:47

I have dataframe eg. like below

Event['EVENT_ID'] = [ 4162, 4161, 4160, 4159,4158, 4157, 4156, 4155, 4154]

need to convert each row word to binary.

Event['b']=bin(Event['EVENT_ID']) doesn't work 
TypeError: cannot convert the series to <class 'int'>

expected new column with binary, remove 0b and split the column to 16 separate column

bin(4162) = '0b1000001000010'
Answer

You can also work with numpy, much faster than pandas.

Edit: faster numpy using view Couple of tricks here:

  • Work only with the column of interest
  • Convert the underlaying array to uint16, to ensure compatibility with any integer input
  • Swapbytes to have a proper H,L order (at least on my architecture)
  • Split H,L without actually moving any data with view
  • Run unpackbits and reshape accordingly

My machine requires a byteswap to have the bytes of the uint16 in the proper place. Note that this aproach requires to have the data as int16/uint16, while the other one would work for int64 as well.

import pandas as pd
import numpy as npdf = pd.DataFrame({'EVENT_ID': [ 4162, 4161, 4160, 4159,4158, 4157, 4156, 4155, 4154]}, dtype='uint16')zz=np.unpackbits(df.EVENT_ID.values.astype('uint16').byteswap().view('uint8')).reshape(-1,16)
df3 = pd.concat([df,pd.DataFrame(zz)],axis=1)print(f"{df3 =}")df3 =   EVENT_ID  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
0      4162  0  0  0  1  0  0  0  0  0  1   0   0   0   0   1   0
1      4161  0  0  0  1  0  0  0  0  0  1   0   0   0   0   0   1
2      4160  0  0  0  1  0  0  0  0  0  1   0   0   0   0   0   0
3      4159  0  0  0  1  0  0  0  0  0  0   1   1   1   1   1   1
4      4158  0  0  0  1  0  0  0  0  0  0   1   1   1   1   1   0
5      4157  0  0  0  1  0  0  0  0  0  0   1   1   1   1   0   1
6      4156  0  0  0  1  0  0  0  0  0  0   1   1   1   1   0   0
7      4155  0  0  0  1  0  0  0  0  0  0   1   1   1   0   1   1
8      4154  0  0  0  1  0  0  0  0  0  0   1   1   1   0   1   0

older proposed method:

lh = np.unpackbits((df.values & 0xFF).astype('uint8')).reshape(-1,8)
uh = np.unpackbits((df.values >> 8).astype('uint8')).reshape(-1,8)df2 = pd.concat([df, pd.DataFrame(np.concatenate([uh,lh],axis=1),index=df.index)],axis=1)

Benchmark: numpy is orders of magnitude faster than pandas" For 1M points:

  • numpy view: 35ms for 1million uint64 points
  • numpy low/high: 50ms
  • pandas list bin: 1.78s
  • pandas apply format + list: 1.97s
  • pandas apply lambda: 6.08s
df = pd.DataFrame({'EVENT_ID': (np.random.random(int(1e6))*65000).astype('uint16')})

pandas apply format list

In [13]: %timeit df2 = df.join(pd.DataFrame(df['EVENT_ID'].apply('{0:b}'.format).apply(list).tolist()))
1.97 s ± 42.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

pandas list bin

In [10]: %%timeit...: binary_values = pd.DataFrame([list(bin(x)[2:]) for x in df['EVENT_ID']])...: df2 = df.join(binary_values)...:...:
1.78 s ± 53.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

pandas3 apply lambda

In [5]: %%timeit...: for i in range(16):...:     df[f"bit{i}"] = df["EVENT_ID"].apply(lambda x: x & 1 << i).astype(bool).astype(int)...:
6.08 s ± 65.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

numpy

In [14]: %%timeit...: lh = np.unpackbits((df.values & 0xFF).astype('uint8')).reshape(-1,8)...: uh = np.unpackbits((df.values >> 8).astype('uint8')).reshape(-1,8)...: df3=pd.concat([df, pd.DataFrame(np.concatenate([uh,lh],axis=1),index=df.index)],axis=1)...:...:
49.9 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
https://en.xdnf.cn/q/119076.html

Related Q&A

I have an issue : Reading Multiple Text files using Multi-Threading by python

Hello Friends, I hope someone check my code and helping me on this issue. I want to read from multiple text files (at least 4) sequentially and print their content on the screenFirst time not using Thr…

How to print \ in python?

print "\\"It print me in console...But I want to get string \How to get string string \?

Replace word, but another word with same letter format got replaced

Im trying to replace a word in python, but another word with same letter format got replaced example : initial : bg bgt goal : bang banget current result : bang bangtheres what my code…

Python: Split timestamp by date and hour

I have a list of timestamps in the following format:1/1/2013 3:30I began to learn python some weeks ago and I have no idea how to split the date and time. Can anyone of you help me?Output should be on…

ModuleNotFoundError: when importing curses in IDE

I get the error ModuleNotFoundError: No module named _curses every time I try to uses curses in VS Code or PyCharm. But it works in the command prompt (Im on Windows BTW) Code is from Tech With Tim tut…

Add new column in a csv file and manipulate on the on records

I have 4 csv files named PV.csv, Dwel.csv, Sess.csv, and Elap.csv. I have 15 columns and arouind 2000 rows in each file. At first I would like to add a new column named Var in each file and fill up the…

Xpath returns null

I need to scrape the price of this page: https://www.asos.com/monki/monki-lisa-cropped-vest-top-with-ruched-side-in-black/prd/23590636?colourwayid=60495910&cid=2623 However it is always returning …

I am getting an Index error as list out of range. I have to scan through many lines

import nltk import random from nltk.tokenize import sent_tokenize, word_tokenizefile = open("sms.txt", "r") for line in file:#print linea=word_tokenize(line)if a[5] == SBI and a[6]=…

How can i append two classes in JQuery

I have implemented a chat box message (live chat) using django and now i want to add css, but i have problem on how to append multiple classes on messege sent. For example, i want to show other user me…

After installing PyBluez on Windows8.1 I get DLL %1 not valid win32 app

I have installed PyBluez-0.22.win32.exe on a 64bit machine with Python 2.7 (they didnt have a 64bit version). Then I get the following error: ImportError:DLL load failed:%1 is not valid Win32 applicati…