I have a dataframe df
as:
Election Year Votes Vote % Party Region
0 2000 42289 29.40 Janata Dal (United) A
1 2000 27618 19.20 Rashtriya Janata Dal A
2 2000 20886 14.50 Bahujan Samaj Party B
3 2000 17747 12.40 Congress B
4 2000 14047 19.80 Independent C
5 2000 17047 10.80 JLS C
6 2005 8358 15.80 Janvadi Party A
7 2005 4428 13.10 Independent A
8 2005 1647 1.20 Independent B
9 2005 1610 11.10 Independent B
10 2005 1334 15.06 Nationalist C
11 2005 1834 18.06 NJM C
12 2010 21114 20.80 Independent A
13 2010 1042 10.5 Bharatiya Janta Dal A
14 2010 835 0.60 Independent B
15 2010 14305 15.50 Independent B
16 2010 22211 17.70 Congress C
16 2010 20011 14.70 INC C
How can I get the list of the regions that have two or more parties getting more than vote % greater than 10 every election year?
Desired output:
Election Year Region Vote %2000 A 29.402000 A 19.402000 C 19.802000 C 10.802005 A 15.802005 A 13.102005 C 15.062005 C 18.062010 A 20.802010 A 10.52010 C 17.702010 C 14.70
Output contains only regions having more than 10% vote every year and Election year and region name in sorted in ascending order. So, here only Region "A" and "C" will be there in the output.
I have used the following code to sort "Vote %" in descending order after grouping by "Election year" and "Region" and to then compare the top 2 Vote% every year, but it is giving an error.
df1 = df.groupby(['Election Year','Region'])sort_values('Vote %', ascending = False).reset_index()