Spark select top values in RDD

2024/10/13 9:18:11

The original dataset is:

# (numbersofrating,title,avg_rating)
newRDD =[(3,'monster',4),(4,'minions 3D',5),....] 

I want to select top N avg_ratings in newRDD.I use the following code,it has an error.

selectnewRDD = (newRDD.map(x, key =lambda x: x[2]).sortBy(......))TypeError: map() takes no keyword arguments

The expected data should be:

# (numbersofrating,title,avg_rating)
selectnewRDD =[(4,'minions 3D',5),(3,'monster',4)....] 
Answer

You can use either top or takeOrdered with key argument:

newRDD.top(2, key=lambda x: x[2])

or

newRDD.takeOrdered(2, key=lambda x: -x[2])

Note that top is taking elements in descending order and takeOrdered in ascending so key function is different in both cases.

https://en.xdnf.cn/q/69550.html

Related Q&A

Python module BeautifulSoup extracting anchors href

i am using BeautifulSoup module to select all href from html by this way:def extract_links(html):soup = BeautifulSoup(html)anchors = soup.findAll(a)print anchorslinks = []for a in anchors:links.append(…

Pandas: how to get a particular group after groupby? [duplicate]

This question already has answers here:How to access subdataframes of pandas groupby by key(6 answers)Closed 9 years ago.I want to group a dataframe by a column, called A, and inspect a particular grou…

aws cli in cygwin - how to clean up differences in windows and cygwin style paths

I suspect this is my ineptitude in getting path variables set right, but Im at a loss.Ive installed the aws cli using pip in cygwin.pip install awscliI have two python environments... a windows anacon…

Print all variables and their values [duplicate]

This question already has answers here:too many values to unpack, iterating over a dict. key=>string, value=>list(8 answers)Closed 6 years ago.This question has been asked quite a bit, and Ive tr…

How to emulate multiprocessing.Pool.map() in AWS Lambda?

Python on AWS Lambda does not support multiprocessing.Pool.map(), as documented in this other question. Please note that the other question was asking why it doesnt work. This question is different, Im…

Tkinter overrideredirect no longer receiving event bindings

I have a tinter Toplevel window that I want to come up without a frame or a titlebar and slightly transparent, and then solid when the mouse moves over the window. To do this I am using both Toplevel.…

Reusing Tensorflow session in multiple threads causes crash

Background: I have some complex reinforcement learning algorithm that I want to run in multiple threads. ProblemWhen trying to call sess.run in a thread I get the following error message:RuntimeError: …

Conditional column arithmetic in pandas dataframe

I have a pandas dataframe with the following structure:import numpy as np import pandas as pd myData = pd.DataFrame({x: [1.2,2.4,5.3,2.3,4.1], y: [6.7,7.5,8.1,5.3,8.3], condition:[1,1,np.nan,np.nan,1],…

Need some assistance with Python threading/queue

import threading import Queue import urllib2 import timeclass ThreadURL(threading.Thread):def __init__(self, queue):threading.Thread.__init__(self)self.queue = queuedef run(self):while True:host = self…

Python redirect (with delay)

So I have this python page running on flask. It works fine until I want to have a redirect. @app.route("/last_visit") def check_last_watered():templateData = template(text = water.get_last_wa…