Spark Unique pair in cartesian product

2024/10/10 8:19:29

I have this:

In [1]:a = sc.parallelize([a,b,c])
In [2]:a.cartesian(a).collect()
Out[3]: [(a, a), (a, b), (a, c), (b, a), (c, a), (b, b), (b, c), (c, b), (c, c)]

I want the following result:

In [1]:a = sc.parallelize([1,2,3])
In [2]:a.cartesianMoreInteligent(a).collect()
Out[3]: [(a, a), (a, b), (a, c), (b, b), (b, c), (c, c)]

Because my calculus return a symetrical matrix (correlation). What is the best way to achieve this ? (No loop) With a, b and c can be anything, even tuple.

Answer

Not sure about the python syntax, but in scala you could write:

a.cartesian(a).filter{ case (a,b) => a <= b }.collect()

My guess is in python it would be something like:

a.cartesian(a).filter(lambda a, b: a <= b).collect()

https://en.xdnf.cn/q/118480.html

Related Q&A

How to use double click bid manager(DBM) API in python

I am trying to use the google Double click bid manager (DBM) API, to download reports, I am trying to make this automatic without manual authentication, but all I can find is the GitHub repo for DBM sa…

How can I replace a value in an existing excel csv file using a python program?

How can I update a value in an existing .csv file using a python program. At the moment the file is read into the program but I need to be able to change this value using my program, and for the change…

Why might Python break down halfway through a loop? TypeError: __getitem__

The GoalI have a directory with 65 .txt files, which I am parsing, one by one, and saving the outputs into 65 corresponding .txt files. I then plan to concatenate them, but Im not sure if jumping strai…

RoboBrowser getting type error NoneType object is not subscriptable

Im trying to make a kahoot spammer which inputs a pin number and a username, decided by the user. Im getting a type error when I run this code:import re from robobrowser import RoboBrowser#Getting pin …

How to create a 2d list with all same values but can alter multiple elements within? (python)

Im trying to create a list that holds this exact format: [[2],[2],[2],[2],[2],[2],[2],[2],[2],[2]]and when list[3][0] = 9 is called, the list becomes [[2],[9],[2],[9],[2],[9],[2],[9],[2],[9]]How do I c…

Understanding Function Closures [duplicate]

This question already has answers here:Why arent python nested functions called closures?(10 answers)Closed 9 years ago.Im struggling to understand Function closures properly. For example in the code …

Update value for every row based on either of two previous columns

I am researching ATP Tour male tennis data. Currently, I have a Pandas dataframe that contains ~60,000 matches. Every row contains information / statistics about the match, split between the winner and…

Count consecutive equal values in array [duplicate]

This question already has answers here:Count consecutive occurences of values varying in length in a numpy array(5 answers)Closed 5 years ago.Say I have the following numpy array:a = np.array([1,5,5,2,…

how can I show please wait gif image before the process is complete

I want to show "please wait gif" image from img() class before the ListApp() class process is complete and then as soon as the process of that class is completed the screeen of ListApp should…

TypeError: list of indices must be integers, not str

What is wrong in my code to give me the error:TypeError: List of indices must be integers, not strHere is my code:print("This programe will keep track of your TV schedule.") Finish = False Sh…