a list of identical elements in the merge list

2024/11/19 19:28:17

I need to merge the list and have a function that can be implemented, but when the number of merges is very slow and unbearable, I wonder if there is a more efficient way

Consolidation conditions:Sub-lists contain identical numbers to each other Thank you

Simple Association:

[7,8,9] = [7,8]+[8,9]   #The same number 8 

Cascade contains:

[1,2,3]   = [1,2,3]+[3,4] #The same number 3 
[3,4,5,6] = [3,4],[4,5,6] #The same number 4 
[1,2,3,4,5,6] = [1,2,3]+[3,4,5,6] #The same number 3 

Function:

a =  [ [1,2,3],[4,5,6],[3,4],[7,8],[8,9],[6,12,13] ]
b = len(a)
for i in range(b):for j in range(b):x = list(set(a[i]+a[j]))y = len(a[j])+len(a[i])if i == j or a[i] == 0 or a[j] == 0:breakelif len(x) < y:a[i] = xa[j] = [0]print a
print [i for i in a if i!= [0]]

result:

[[8, 9, 7], [1, 2, 3, 4, 5, 6, 10, 11]]

Above is an example where each sub-list in the actual calculation has a length of only 2,

a =  [[1,3],[5,6],[3,4],[7,8],[8,9],[12,13]]

I want to miss out more data, here is a simulation data.

a = np.random.rand(150,150)>0.99 
a[np.tril_indices(a.shape[1], -1)] = 0     
a[np.diag_indices(a.shape[1])]     = 0     
a = [list(x) for x in np.c_[np.where(a)]]consolidate(a)
Answer

I think your algorithm is close to optimal, except that the inner loop can be shortened because the intersection operation is symmetric, i.e. if you check that (A, B) intersect, there is no need to check for (B, A). This way you would go from O(n²) to O(n * (n / 2)).

However, I would rewrite the piece of code more cleanly and I would also avoid modifying the input. Note also, that since sets do not guarantee ordering, it is a good idea to do some sorting before getting to list.

Here is my proposed code (EDITED to reduce the number of castings and sortings):

def consolidate(items):items = [set(item.copy()) for item in items]for i, x in enumerate(items):for j, y in enumerate(items[i + 1:]):if x & y:items[i + j + 1] = x | yitems[i] = Nonereturn [sorted(x) for x in items if x]

Encapsulating your code in a function, I would get:

def consolidate_orig(a):a = [x.copy() for x in a]b = len(a)for i in range(b):for j in range(b):x = list(set(a[i]+a[j]))y = len(a[j])+len(a[i])if i == j or a[i] == 0 or a[j] == 0:breakelif len(x) < y:a[i] = xa[j] = [0]return [i for i in a if i!= [0]]

This would allow us to do some clean micro-benchmarking (for completeness I have included also @zipa's merge()):


EDIT:

@zipa's code is not properly encapsulated, here is an equivalent version with proper encapsulation:

def merge(iterable, base=None):if base is None:base = iterablemerged = set([tuple(set(i).union(*[j for j in base if set(i).intersection(j)])) for i in iterable])if merged == iterable:return mergedelse:return merge(merged, base)

and updated timings:

in_list = [[1,2,3], [4,5,6], [3,4], [7,8], [8,9], [6,12,13]]
%timeit consolidate_orig(in_list)
# 17.9 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit consolidate(in_list)
# 6.15 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit merge(in_list)
# 53.6 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)in_list = [[1, 3], [5, 6], [3, 4], [7, 8], [8, 9], [12, 13]]
%timeit consolidate_orig(in_list)
# 16.1 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit consolidate(in_list)
# 5.87 µs ± 71.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit merge(in_list)
# 27 µs ± 701 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Showing that, at least for this input, the proposed solution is consistently faster. Since it is not too straightforward to generate large meaningful inputs, I'll leave to you to check that this is more efficient then your approach for the larger inputs you have in mind.


EDIT

With larger, but probably meaningless inputs, the timings are still favorable for the proposed version:

in_list = [[1,2,3], [4,5,6], [3,4], [7,8], [8,9], [6,12,13]] * 300
%timeit consolidate_orig(in_list)
# 1.04 s ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit consolidate(in_list)
# 724 ms ± 7.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit merge(in_list)
# 1.04 s ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)in_list = [[1, 3], [5, 6], [3, 4], [7, 8], [8, 9], [12, 13]] * 300
%timeit consolidate_orig(in_list)
# 1.03 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit consolidate(in_list)
# 354 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit merge(in_list)
# 967 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
https://en.xdnf.cn/q/118508.html

Related Q&A

How To Get A Contour Of More/Less Of The Expected Area In OpenCV Python

I doing some contour detection on a image and i want to find a contour based on a area that i will fix in this case i want the contour marked in red. So i want a bounding box around the red contour Fol…

Storing output of SQL Query in Python Variable

With reference to this, I tried modifying my SQL query as follows:query2 ="""insert into table xyz(select * from abc where date_time > %s and date_time <= ( %s + interval 1 hour))&…

file modification and creation

How would you scan a dir for a text file and read the text file by date modified, print it to screen having the script scan the directory every 5 seconds for a newer file creadted and prints it. Is it …

How to share a file between modules for logging in python

I wanted to log messages from different module in python to a file. Also I need to print some messages to console for debugging purpose. I used logger module for this purpose . But logger module will l…

Dont understand how this example one-hot code indexes a numpy array with [i,j] when j is a tuple?

I dont get how the line: results[i, sequence] = 1 works in the following.I am following along in the debugger with some sample code in a Manning book: "Deep Learning with Python" (Example 3.5…

convert nested list to normal list using list comprehension in python [duplicate]

This question already has answers here:How do I make a flat list out of a list of lists?(32 answers)Flatten an irregular (arbitrarily nested) list of lists(54 answers)Closed 6 years ago.How can I do t…

Transformation of pandas DataFrame adds a blank row

My original question was posted here. I have a dataframe as follows:ID START END SEQ 1 11 12 1 1 14 15 3 1 13 14 2 2 10 14 1 3 11 15 1 3 16 17 …

How to find a element after a search click checkbox in selenium Python

After I search for a user in the search field, I get the user I searched Now I need to select this user that shown in the list I tried with xpath and did not find the element ! could you help ? So aft…

Running parameterized queries

Quite new to this google bigquery sql thing so please bear with me. Im trying to build a google standardSQL parameterized query. The following sample was used and ran successfully on Google BigQuery We…

how to do this disparity map in OpenCV

paper "Fast Obstacle Detection Using U-Disparity Maps with Stereo Vision"reference paper: "Fast Obstacle Detection Using U-Disparity Maps with Stereo Vision"I want to ask can opencv…