SQLite: return only top 2 results within each group

2024/9/16 23:27:11

I checked other solutions to similar problems, but sqlite does not support row_number() and rank() functions or there are no examples which involve joining multiple tables, grouping them by multiple columns and returning only top N results for each group at the same time.

Here's the code i run

db = sqlite3.connect('mydb')cursor = db.cursor()cursor.execute('''CREATE TABLE orders(id INTEGER PRIMARY KEY, product_id INTEGER,client_id INTEGER)'''
)cursor.execute('''CREATE TABLE clients(id INTEGER PRIMARY KEY, gender TEXT,city TEXT)'''
)cursor.execute('''CREATE TABLE products(id INTEGER PRIMARY KEY, category_name TEXT)'''
)orders = [(9, 6), (3, 10), (8, 6), (4, 8),(5, 6), (7, 4), (9, 2), (10, 8),(4, 6), (3, 1), (10, 2), (9, 8),(9, 7), (4, 9), (7, 10), (2, 7),(4, 7), (6, 2), (6, 2), (9, 3),(10, 6), (4, 4), (2, 6), (3, 8),(9, 2), (1, 9), (3, 9), (9, 4),(5, 5), (7, 1), (8, 7), (7, 8),(6, 3), (9, 6), (8, 3), (7, 1),(10, 5), (7, 10), (8, 1), (7, 9),(4, 4), (3, 8), (5, 2), (5, 8),(6, 10), (9, 7), (2, 2), (4, 10),(5, 10), (3, 9)
]clients = [('Male', 'NY'),('Female', 'NY'),('Male', 'London'),('Male', 'London'),('Male', 'NY'),('Female', 'NY'),('Female', 'London'),('Male', 'London'),('Male', 'NY'),('Female', 'London')
]products = [('Kitchen', ),('Sport', ),('Furniture', ),('Furniture', ),('Furniture', ),('Sport', ),('Sport', ),('Kitchen', ),('Kitchen', ),('Kitchen', )
]cursor.executemany("INSERT INTO orders(product_id, client_id) VALUES(?,?)", orders)
cursor.executemany("INSERT INTO clients(gender, city) VALUES(?,?)", clients)
cursor.executemany("INSERT INTO products(category_name) VALUES(?)", (products))db.commit()cursor.execute('''SELECTcategory_name,city, gender,product_id, COUNT(product_id)FROM ordersLEFT JOIN products ON product_id = products.idLEFT JOIN clients ON client_id = clients.idGROUP BY product_id, category_name, city, genderORDER BY category_name, city, gender, COUNT(product_id) DESC'''
)print('''category_name, city, gender, product_id, COUNT(product_id)''')all_rows = cursor.fetchall()
for a, b, c, d, e in all_rows:print(a, b, c, d, e)db.close()

Now the question is how do i get output like this in a single query? I don't need rows crossed with red lines since i only need top 2.

enter image description here

Answer

This can be achieved by embedding your existing query inside a CTE using WITH, then using it in a WHERE ... IN subquery. The subquery selects LIMIT 2 product IDs from the CTE that match category_name, city, and gender, ordered by product count.

WITH order_groups AS (SELECTcategory_name,city, gender,product_id,COUNT(product_id) AS product_countFROM orders OOLEFT JOIN products ON product_id = products.idLEFT JOIN clients ON client_id = clients.idGROUP BY product_id, category_name, city, genderORDER BY category_name, city, gender, COUNT(product_id) DESC
)
SELECT * FROM order_groups OG_outer
WHERE OG_outer.product_id IN (SELECT product_idFROM order_groups OG_innerWHERE OG_outer.category_name = OG_inner.category_name ANDOG_outer.city = OG_inner.city ANDOG_outer.gender = OG_inner.genderORDER BY OG_inner.product_count DESC LIMIT 2
)
ORDER BY category_name, city, gender, product_count DESC

This outputs the following rows as requested:

Furniture|London|Female|4|2
Furniture|London|Female|3|1
Furniture|London|Male|4|3
Furniture|London|Male|3|2
Furniture|NY|Female|5|2
Furniture|NY|Female|4|1
Furniture|NY|Male|3|3
Furniture|NY|Male|4|1
Kitchen|London|Female|9|2
Kitchen|London|Female|8|1
Kitchen|London|Male|9|3
Kitchen|London|Male|8|1
Kitchen|NY|Female|9|4
Kitchen|NY|Female|10|2
Kitchen|NY|Male|1|1
Kitchen|NY|Male|8|1
Sport|London|Female|7|2
Sport|London|Female|2|1
Sport|London|Male|7|2
Sport|London|Male|6|1
Sport|NY|Female|2|2
Sport|NY|Female|6|2
Sport|NY|Male|7|3
https://en.xdnf.cn/q/72895.html

Related Q&A

Python list.append if not in list vs set.add performance [duplicate]

This question already has answers here:Which is faster and why? Set or List?(3 answers)Closed 6 years ago.Which is more performant, and what is asymptotic complexity (or are they equivalent) in Pytho…

using the hardware rng from python

Are there any ready made libraries so that the intel hardware prng (rdrand) can be used by numpy programs to fill buffers of random numbers?Failing this can someone point me in the right direction for…

How do I revert sys.stdout.close()?

In the interactive console:>>> import sys >>> sys.stdout <open file <stdout>, mode w at 0xb7810078> >>> sys.stdout.close() >>> sys.stdout # confirming th…

Find a value from x axis that correspond to y axis in matplotlib python

I am trying to do simple task such as to read values of x axis that corresponds to value of y axis in matplotlib and I cannot see what is wrong. In this case I am interested for example to find which v…

Django accessing OneToOneField

Made a view that extended User:class Client(models.Model):user = models.OneToOneField(User, related_name=user)def __unicode__(self):return "%s" % (self.user) I am currently trying to access…

Pandas DataFrame: copy the contents of a column if it is empty

I have the following DataFrame with named columns and index:a a* b b* 1 5 NaN 9 NaN 2 NaN 3 3 NaN 3 4 NaN 1 NaN 4 NaN 9 NaN 7The data…

Solving the most profit algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 9…

Get combobox value in python

Im developing an easy program and I need to get the value from a Combobox. It is easy when the Combobox is in the first created window but for example if I have two windows and the Combobox is in the s…

PyAudio (PortAudio issue) Python

I installed pyaudio with anaconda python. Using conda install pyaudio on windows. It said it installed and it also installed PortAudio with it.However, when I create my file and run it now I get the fo…

Python multiprocessing with M1 Mac

I have a mac (Mac Os 11.1, Python Ver 3.8.2) and need to work in multiprocessing, but the procedures doesn’t work. import multiprocessingdef func(index: int):print(index)manager = multiprocessing.Mana…