Python: string to integer as a key

2024/10/10 0:27:45

I'm trying to convert a string column in a dataframe to int. The strings should be replaced with an integer as a key value.

Data:

user_id site_id 
100     url1.com 
100     url2.com 
100     url1.com 
101     url2.com 
101     url2.com 
101     url2.com

Wanted output:

user_id site_id 
100     1 
100     2 
100     1 
101     2 
101     2 
101     2

I tried to get all unique urls with:

names = pd.unique(df.site_id.ravel()) 
urls = pd.Series(np.arange(len(names)), names) 

and then

df["site_id"] = df.applymapp(urls.get)
Answer

You want factorize to encode the values to ints:

In [52]:
df['site_id'] = pd.factorize(df['site_id'])[0] + 1
dfOut[52]:user_id  site_id
0      100        1
1      100        2
2      100        1
3      101        2
4      101        2
5      101        2

here factorize returns an array:

In [53]:
pd.factorize(df['site_id'])Out[53]:
(array([0, 1, 0, 1, 1, 1], dtype=int64), Int64Index([1, 2], dtype='int64'))

we want the encoded values in the tuple and add 1 to each:

pd.factorize(df['site_id'])[0] + 1
https://en.xdnf.cn/q/118519.html

Related Q&A

Data manipulation, kind of downsampling

I have a large csv file, example of the data below. I will use an example of eight teams to illustrate.home_team away_team home_score away_score year belgium france 2…

Chrome Native Messaging throwing error when sending a base64 string to client

Using Chrome Native Messaging sample app as a template am able make a system call to bashos.system("<bash command>")The requirement is to return a base64 string from the python scriptos…

Exporting DataFrame to Excel using pandas without subscribe

How can I export DataFrame to excel without subscribe? For exemple: Im doing webscraping and there is a table with pagination, so I take the page 1 save it in DataFrame, export to excel e do it again …

Fraction of a real number in python giving complicated answer

Importing Fraction from fractions to give a fractional representation of a real number, but giving responses quite complicated which seems very simple by the paper-pen method. Fractions(.2) giving answ…

Scrape latitude and longitude (Google Maps) inside Script type=text/javascript

Im beginner in Web Scrapping. Im trying to get latitude and longitude from this web: https://urbania.pe/inmueble/proyecto/ememhvin-proyecto-mariscal-castilla-lima-santiago-de-surco-tale-inmobiliaria-65…

How to delete a button that is made by a loop

from tkinter import *class Main:def __init__(self, root):for i in range(0, 9):for k in range(0, 9):Button(root, text=" ").grid(row=i, column=k)root.mainloop()root = Tk()x = Main(root)How do I…

Invalid array shape with neural network using Keras?

Currently studying the Deep Learning with Python book by Francios Chollet. I am very new to this and I am getting this error code despite following his code verbatim. Can anyone interpret the error mes…

How to download PDF files from a list of URLs in Python?

I have a big list of links to PDF files that I need to download (500+) and I was trying to make a program to download them all because I dont want to manually do them. This is what I have and when I tr…

Training on GPU much slower than on CPU - why and how to speed it up?

I am training a Convolutional Neural Network using Google Colabs CPU and GPU. This is the architecture of the network: Model: "sequential" ____________________________________________________…

Check list item is present in Dictionary

Im trying to extend Python - Iterate thru month dates and print a custom output and add an addtional functionality to check if a date in the given date range is national holiday, print "NH" a…