Convert Geo json with nested lists to pandas dataframe

2024/10/8 20:32:13

I've a massive geo json in this form:

 {'features': [{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-74.264948, 42.419877, 0],[-74.262041, 42.425856, 0],[-74.261175, 42.427631, 0],[-74.260384, 42.429253, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-73.929627, 42.078788, 0],[-73.929114, 42.081658, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albuquerque','geometry': {'coordinates': [[[-74.769198, 43.114089, 0],[-74.76786, 43.114496, 0],[-74.766474, 43.114656, 0]]],'type': 'Polygon'}}}],'type': 'FeatureCollection'}

After reading the json:

import json
with open('x.json') as f:data = json.load(f)

I read the values into a list and then into a dataframe:

#to get a list of all markets
mkt=set([f['properties']['MARKET'] for f in data['features']])#to create a list of market and associated lat long
markets=[(market,list(chain.from_iterable(f['geometry']['coordinates']))) for f in data['features'] for market in mkt if f['properties']['MARKET']==mkt]df = pd.DataFrame(markets[0:], columns=['a','b'])     

First few rows of df are:

      a       b
0   Albany  [[-74.264948, 42.419877, 0], [-74.262041, 42.4...
1   Albany  [[-73.929627, 42.078788, 0], [-73.929114, 42.0...
2   Albany  [[-74.769198, 43.114089, 0], [-74.76786, 43.11...

Then to unnest the nested list in column b, I used pandas concat:

df1 = pd.concat([df.iloc[:,0:1], df['b'].apply(pd.Series)], axis=1)

But this is creating 8070 columns with many NaNs. Is there a way to group all the latitudes and longitudes by the Market (column a)? A million rows by two column dataframe is desired.

Desired op is:

mkt         lat         long 
Albany      42.419877   -74.264948
Albany      42.078788   -73.929627
..
Albuquerque  35.105361   -106.640342

Pls note that the zero in the list element ([-74.769198, 43.114089, 0]) needs to be ignored.

Answer

Something like this??

from pandas.io.json import json_normalize
df = json_normalize(geojson["features"])coords = 'properties.geometry.coordinates'df2 = (df[coords].apply(lambda r: [(i[0],i[1]) for i in r[0]]).apply(pd.Series).stack().reset_index(level=1).rename(columns={0:coords,"level_1":"point"}).join(df.drop(coords,1), how='left')).reset_index(level=0)df2[['lat','long']] = df2[coords].apply(pd.Series)df2

Outputs:

   index  point properties.geometry.coordinates properties.MARKET  \
0      0      0         (-74.264948, 42.419877)            Albany   
1      0      1         (-74.262041, 42.425856)            Albany   
2      0      2         (-74.261175, 42.427631)            Albany   
3      0      3         (-74.260384, 42.429253)            Albany   
4      1      0         (-73.929627, 42.078788)            Albany   
5      1      1         (-73.929114, 42.081658)            Albany   
6      2      0         (-74.769198, 43.114089)       Albuquerque   
7      2      1          (-74.76786, 43.114496)       Albuquerque   
8      2      2         (-74.766474, 43.114656)       Albuquerque   properties.geometry.type        lat       long  
0                  Polygon -74.264948  42.419877  
1                  Polygon -74.262041  42.425856  
2                  Polygon -74.261175  42.427631  
3                  Polygon -74.260384  42.429253  
4                  Polygon -73.929627  42.078788  
5                  Polygon -73.929114  42.081658  
6                  Polygon -74.769198  43.114089  
7                  Polygon -74.767860  43.114496  
8                  Polygon -74.766474  43.114656 

If:

geojson = {'features': [{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-74.264948, 42.419877, 0],[-74.262041, 42.425856, 0],[-74.261175, 42.427631, 0],[-74.260384, 42.429253, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-73.929627, 42.078788, 0],[-73.929114, 42.081658, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albuquerque','geometry': {'coordinates': [[[-74.769198, 43.114089, 0],[-74.76786, 43.114496, 0],[-74.766474, 43.114656, 0]]],'type': 'Polygon'}}}],'type': 'FeatureCollection'}
https://en.xdnf.cn/q/70097.html

Related Q&A

Pymongo - ValueError: NaTType does not support utcoffset when using insert_many

I am trying to incrementally copy documents from one database to another. Some fields contain date time values in the following format:2016-09-22 00:00:00while others are in this format:2016-09-27 09:0…

python numpy argmax to max in multidimensional array

I have the following code:import numpy as np sample = np.random.random((10,10,3)) argmax_indices = np.argmax(sample, axis=2)i.e. I take the argmax along axis=2 and it gives me a (10,10) matrix. Now, I …

Can Keras model.predict return a dictionary?

The documentation https://keras.io/models/model/#predict says that model.predict returns Numpy array(s) of predictions. In the Keras API, is there is a way to distinguishing which of these arrays are…

Flask OIDC: oauth2client.client.FlowExchangeError

The Problem: The library flask-oidc includes the scope parameter into the authorization-code/access-token exchange request, which unsurprisingly throws the following error:oauth2client.client.FlowExcha…

Cumulative count at a group level Python

I have a pandas dataframe like this : df = pd.DataFrame([[A, 1234, 20120201],[A, 1134, 20120201],[A, 1011, 20120201],[A, 1123, 20121004],[A, 1111, 20121004],[A, 1224, 20121105],[B, 1156, 20120403],[B, …

Easiest ways to generate graphs from Python? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

Stripping python namespace attributes from an lxml.objectify.ObjectifiedElement [duplicate]

This question already has answers here:Closed 11 years ago.Possible Duplicate:When using lxml, can the XML be rendered without namespace attributes? How can I strip the python attributes from an lxml…

matplotlib xkcd and black figure background

I am trying to make a plot using matplotlibs xkcd package while having a black background. However, xkcd seems to add a sort of white contour line around text and lines. On a white background you cant …

Python: Whats the difference between set.difference and set.difference_update?

s.difference(t) returns a new set with no elements in t.s.difference_update(t) returns an updated set with no elements in t.Whats the difference between these two set methods? Because the difference_u…

python telebot got unexpected response

I have been using my Telegram bot for sending me different notifications from my desktop computer using pythons telebot library. Everything was working properly for quite a long time, but one day it st…