Question 1

I've a massive geo json in this form:

 {'features': [{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-74.264948, 42.419877, 0],[-74.262041, 42.425856, 0],[-74.261175, 42.427631, 0],[-74.260384, 42.429253, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-73.929627, 42.078788, 0],[-73.929114, 42.081658, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albuquerque','geometry': {'coordinates': [[[-74.769198, 43.114089, 0],[-74.76786, 43.114496, 0],[-74.766474, 43.114656, 0]]],'type': 'Polygon'}}}],'type': 'FeatureCollection'}

After reading the json:

import json
with open('x.json') as f:data = json.load(f)

I read the values into a list and then into a dataframe:

#to get a list of all markets
mkt=set([f['properties']['MARKET'] for f in data['features']])#to create a list of market and associated lat long
markets=[(market,list(chain.from_iterable(f['geometry']['coordinates']))) for f in data['features'] for market in mkt if f['properties']['MARKET']==mkt]df = pd.DataFrame(markets[0:], columns=['a','b'])

First few rows of df are:

      a       b
0   Albany  [[-74.264948, 42.419877, 0], [-74.262041, 42.4...
1   Albany  [[-73.929627, 42.078788, 0], [-73.929114, 42.0...
2   Albany  [[-74.769198, 43.114089, 0], [-74.76786, 43.11...

Then to unnest the nested list in column b, I used pandas concat:

df1 = pd.concat([df.iloc[:,0:1], df['b'].apply(pd.Series)], axis=1)

But this is creating 8070 columns with many NaNs. Is there a way to group all the latitudes and longitudes by the Market (column a)? A million rows by two column dataframe is desired.

Desired op is:

mkt         lat         long 
Albany      42.419877   -74.264948
Albany      42.078788   -73.929627
..
Albuquerque  35.105361   -106.640342

Pls note that the zero in the list element ([-74.769198, 43.114089, 0]) needs to be ignored.

Question 2

Something like this??

from pandas.io.json import json_normalize
df = json_normalize(geojson["features"])coords = 'properties.geometry.coordinates'df2 = (df[coords].apply(lambda r: [(i[0],i[1]) for i in r[0]]).apply(pd.Series).stack().reset_index(level=1).rename(columns={0:coords,"level_1":"point"}).join(df.drop(coords,1), how='left')).reset_index(level=0)df2[['lat','long']] = df2[coords].apply(pd.Series)df2

Outputs:

   index  point properties.geometry.coordinates properties.MARKET  \
0      0      0         (-74.264948, 42.419877)            Albany   
1      0      1         (-74.262041, 42.425856)            Albany   
2      0      2         (-74.261175, 42.427631)            Albany   
3      0      3         (-74.260384, 42.429253)            Albany   
4      1      0         (-73.929627, 42.078788)            Albany   
5      1      1         (-73.929114, 42.081658)            Albany   
6      2      0         (-74.769198, 43.114089)       Albuquerque   
7      2      1          (-74.76786, 43.114496)       Albuquerque   
8      2      2         (-74.766474, 43.114656)       Albuquerque   properties.geometry.type        lat       long  
0                  Polygon -74.264948  42.419877  
1                  Polygon -74.262041  42.425856  
2                  Polygon -74.261175  42.427631  
3                  Polygon -74.260384  42.429253  
4                  Polygon -73.929627  42.078788  
5                  Polygon -73.929114  42.081658  
6                  Polygon -74.769198  43.114089  
7                  Polygon -74.767860  43.114496  
8                  Polygon -74.766474  43.114656

If:

geojson = {'features': [{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-74.264948, 42.419877, 0],[-74.262041, 42.425856, 0],[-74.261175, 42.427631, 0],[-74.260384, 42.429253, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albany','geometry': {'coordinates': [[[-73.929627, 42.078788, 0],[-73.929114, 42.081658, 0]]],'type': 'Polygon'}}},{'properties': {'MARKET': 'Albuquerque','geometry': {'coordinates': [[[-74.769198, 43.114089, 0],[-74.76786, 43.114496, 0],[-74.766474, 43.114656, 0]]],'type': 'Polygon'}}}],'type': 'FeatureCollection'}

Convert Geo json with nested lists to pandas dataframe

Related Q&A

Pymongo - ValueError: NaTType does not support utcoffset when using insert_many

python numpy argmax to max in multidimensional array

Can Keras model.predict return a dictionary?

Flask OIDC: oauth2client.client.FlowExchangeError

Cumulative count at a group level Python

Easiest ways to generate graphs from Python? [closed]

Stripping python namespace attributes from an lxml.objectify.ObjectifiedElement [duplicate]

matplotlib xkcd and black figure background

Python: Whats the difference between set.difference and set.difference_update?

python telebot got unexpected response