Comparing first element of the consecutive lists of tuples in Python

2024/10/9 0:44:26

I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Here is my list:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

I would like to make a list of lists out of it which looks something like this:`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

I hope if there is any efficient way.

Answer

You can use an OrderedDict to group the elements by the first subelement of each tuple:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]from collections import OrderedDictod  = OrderedDict()for a,b in myList:od.setdefault(a,[]).append(b)print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

If you really want tuples:

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict:

from collections import defaultdictod  = defaultdict(list)for a,b in myList:od[a].append(b)print(list(od.values()))

Lastly, if your data is in order as per your input example i.e sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples:

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

Output:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

Again the groupby will only work if your data is sorted by at least the first element.

Some timings on a reasonable sized list:

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]In [34]: myList.sort()In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loopIn [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:od[a].append(b)....: 
10 loops, best of 3: 33.8 ms per loopIn [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:if x not in dictionary:dictionary[x] = [] # new empty listdictionary[x].append(y)....: 
10 loops, best of 3: 63.3 ms per loopIn [38]: %%timeit   
od = OrderedDict()
for a,b in myList:od.setdefault(a,[]).append(b)....: 
10 loops, best of 3: 80.3 ms per loop

If order matters and the data is sorted, go with the groupby, it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict.

If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach.

https://en.xdnf.cn/q/70076.html

Related Q&A

Upload a file using boto

import boto conn = boto.connect_s3(, )mybucket = conn.get_bucket(data_report_321)I can download the file from a bucket using the following code.for b in mybucket:print b.nameb.get_contents_to_filename…

How to get n-gram collocations and association in python nltk?

In this documentation, there is example using nltk.collocations.BigramAssocMeasures(), BigramCollocationFinder,nltk.collocations.TrigramAssocMeasures(), and TrigramCollocationFinder.There is example me…

Using Python3 on macOS as default but pip still get using python 2.7

Im using macOS Big Sur 11.0.1. Im setting up a virtual env $python3 -m venv $my_workdir)/.virtualenvbut getting this error at building wheel package: building _openssl extensioncreating build/temp.maco…

Python Matplotlib Box Plot Two Data Sets Side by Side

I would like to make a boxplot using two data sets. Each set is a list of floats. A and B are examples of the two data setsA = [] B = []for i in xrange(10):l = [random.random() for i in xrange(100)]m =…

perform() and reset_actions() in ActionChains not working selenium python

This is the code that habe no error: perform() and reset_actions() but these two functions have to work combinedly import os import time from selenium import webdriver from selenium.webdriver.common.ac…

nosetests not recognized on Windows after being installed and added to PATH

Im on exercise 46 of Learn Python the Hard Way, and Im meant to install nose and run nosetests. Ive installed nose already using pip, but when I run nosetests in the directory above the tests folder, I…

Using a context manager with mysql connector python

Im moving my code across from an sqlite database to mysql and Im having a problem with the context manager, getting the following attribute error.Ive tried combinations of mydb.cursor() as cursor, mydb…

Value of Py_None

It is clear to me that None is used to signify the lack of a value. But since everything must have an underlying value during implementation, Im looking to see what value has been used in order to sign…

Getting the href of a tag which is in li

How to get the href of the all the tag that is under the class "Subforum" in the given code?<li class="subforum"> <a href="Link1">Link1 Text</a> </l…

Put value at centre of bins for histogram

I have the following code to plot a histogram. The values in time_new are the hours when something occurred.time_new=[9, 23, 19, 9, 1, 2, 19, 5, 4, 20, 23, 10, 20, 5, 21, 17, 4, 13, 8, 13, 6, 19, 9, 1…