Remove duplicates from json file matching against multiple keys

2024/11/15 17:59:53

Original Post = Remove duplicates from json data

This is only my second post. I didnt have enough points to comment my question on the original post...So here I am.

Andy Hayden makes a great point - "Also, those aren't really duplicates... – Andy Hayden"

My question is just that situation... How can you remove duplicates from a json file but by matching against more than 1 key in the json file?

Here is the original example: (it was pointed out that it is not a valid json)

{{obj_id: 123,location: {x: 123,y: 323,},{obj_id: 13,location: {x: 23,y: 333,},{obj_id: 123,location: {x: 122,y: 133,},
}

My case is very similar to this example except In my case, it would keep all these because the x and y values of obj_id are unique, however if x and y were the same than one would be removed from json file.

All the examples I have found only kick out ones based on only one key match..

I don't know if it matters, but the keys that I need to match against are "Company Name" , "First Name", and "Last Name" (it is a 100k plus line json of companies and contacts - there are times when the same person is a contact of multiple companies which is why I need to match against multiple keys)

Thanks.

Answer

I hope this does what you are looking for (It only checks if First and Last Name are different)

raw_data = [{"Company":123,"Person":{"First Name":123,"Last Name":323}},{"Company":13,"Person":{"First Name":123,"Last Name":323}},{"Company":123,"Person":{"First Name":122,"Last Name":133}}]unique = []
for company in raw_data:if all(unique_comp["Person"] != company["Person"] for unique_comp in unique):unique.append(company)print(unique)#>>> [{'Company': 123, 'Person': {'First Name': 123, 'Last Name': 323}}, {'Company': 123, 'Person': {'First Name': 122, 'Last Name': 133}}]
https://en.xdnf.cn/q/119633.html

Related Q&A

Merging CSVs with similar name python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 8…

urllib.error.HTTPError: HTTP Error 403: Forbidden

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent: Mozilla/5.0"} to the h…

Compare multiple file name with the prefix of name in same directory

I have multiple .png and .json file in same directory . And I want to check where the files available in the directory are of same name or not like a.png & a.json, b.png & b.json

how to make a unique data from strings

I have a data like this . the strings are separated by comma."India1,India2,myIndia " "Where,Here,Here " "Here,Where,India,uyete" "AFD,TTT"What I am trying…

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below: file=open(filename,rb) bytes=file.read(8) print(bytes) b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…

How to get spans text without inner attributes text with selenium?

<span class="cname"><em class="multiple">2017</em> Ford </span> <span class="cname">Toyota </span>I want to get only "FORD" …

List of 2D arrays with different size into 3D array [duplicate]

This question already has answers here:How do you create a (sometimes) ragged array of arrays in Numpy?(2 answers)Closed last year.I have a program that generating 2D arrays with different number of r…

How can I read data from database and show it in a PyQt table

I am trying to load data from database that I added to the database through this code PyQt integration with Sqlalchemy .I want the data from the database to be displayed into a table.I have tried this …

Python: Cubic Spline Regression for a time series data

I have the data as shown below. I want to find a CUBIC SPLINE curve that fits the entire data set (link to sample data). Things Ive tried so far:Ive gone through scipys Cubic Spline Functions, but all …

python CSV , find max and print the information

My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printed out. CSIT121 only p…