Number of occurrence of pair of value in dataframe

2024/10/12 14:21:50

I have dataframe with following columns:

Name, Surname, dateOfBirth, city, country

I am interested to find what is most common combination of name and surname and how much it occurs as well. Would be nice also to see list of top 10 combinations.

My idea for top one was:

mostFreqComb= df.groupby(['Name','Surname'])['Name'].count().argmax()

But I think it is not giving me correct answer. Help would be much appreciated !

Thanks, Neb

Answer

For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.

GroupBy.size

You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size:

res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)

By sorting these values, we can easily extract the most common:

most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist()  # handles duplicate top counts

value_counts

Another way is to construct a series of tuples, then apply pd.Series.value_counts:

res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()

The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.

name, surname = res.index[0]  # return most common
most_common_dups = res[res == res.max()].index.tolist()

collections.Counter

If you wish to create a dictionary of (name, surname): counts entries, you can do so via collections.Counter:

from collections import Counterzipper = zip(df.Name, df.Surname)
c = Counter(zipper)

Counter has useful methods such as most_common, which you can use to extract your result.

https://en.xdnf.cn/q/69640.html

Related Q&A

how do i dump a single sqlite3 table in python?

I would like to dump only one table but by the looks of it, there is no parameter for this. I found this example of the dump but it is for all the tables in the DB: # Convert file existing_db.db to SQL…

Django automatically create primary keys for existing database tables

I have an existing database that Im trying to access with Django. I used python manage.py inspectdb to create the models for the database. Currently Im able to import the models into the python shell h…

matplotlib.pyplot scatterplot legend from color dictionary

Im trying to make a legend with my D_id_color dictionary for my scatterplot. How can I create a legend based on these values with the actual color? #!/usr/bin/python import matplotlib.pyplot as plt f…

Numpy Array Set Difference [duplicate]

This question already has answers here:Find the set difference between two large arrays (matrices) in Python(3 answers)Closed 7 years ago.I have two numpy arrays that have overlapping rows:import numpy…

Pylint not working within Spyder

Ive installed Anaconda on a Windows computer and Spyder works fine, but running pylint through the Static Code Analysis feature gives an error. Pylint was installed through Conda. Note: Error in Spyder…

WTForms doesnt validate - no errors

I got a strange problem with the WTForms library. For tests I created a form with a single field:class ArticleForm(Form):content = TextField(Content)It receives a simple string as content and now I use…

NameError: global name numpy is not defined

I am trying to write a feature extractor by gathering essentias (a MIR library) functions. The flow chart is like: individual feature extraction, pool, PoolAggregator, concatenate to form the whole fea…

Django exclude from annotation count

I have following application:from django.db import modelsclass Worker(models.Model):name = models.CharField(max_length=60)def __str__(self):return self.nameclass Job(models.Model):worker = models.Forei…

How to use yield function in python

SyntaxError: yield outside function>>> for x in range(10): ... yield x*x ... File "<stdin>", line 2 SyntaxError: yield outside functionwhat should I do? when I try to use …

Tkinter looks different on different computers

My tkinter window looks very different on different computers (running on the same resolution!):windows 8windows 7I want it to look like it does in the first one. Any ideas?My code looks like this:cla…