Expected String or Unicode when reading JSON with Pandas

2024/9/24 9:21:20

I try to read an Openstreetmaps API output JSON string, which is valid.

I am using following code:

import pandas as pd
import requests# Links unten
minLat = 50.9549
minLon = 13.55232# Rechts oben
maxLat = 51.1390
maxLon = 13.89873osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)osmdata = osm.json()osmdataframe = pd.read_json(osmdata)

which throws following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()199 200     if typ == 'series' or obj is None:/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)264 265         else:
--> 266             self._parse_no_numpy()267 268         if self.obj is None:/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)481         if orient == "columns":482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)484         elif orient == "split":485             decoded = dict((str(k), v)TypeError: Expected String or Unicode

How to modify the request or Pandas read_json, to avoid an error? By the way, what's the problem?

Answer

If you print the json string to a file,

content = osm.read()
with open('/tmp/out', 'w') as f:f.write(content)

you'll see something like this:

{"version": 0.6,"generator": "Overpass API","osm3s": {"timestamp_osm_base": "2014-07-20T07:52:02Z","copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."},"elements": [{"type": "node","id": 536694,"lat": 50.9849256,"lon": 13.6821776,"tags": {"highway": "bus_stop","name": "Niederhäslich Bergmannsweg"}
},
...]}

If the JSON string were to be converted to a Python object, it would be a dict whose elements key is a list of dicts. The vast majority of the data is inside this list of dicts.

This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator'] to be the columns, since almost all the information is in the elements list-of-dicts.

But if you want the DataFrame to consist of the data only in the elements list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.

Further complicating things is that each dict in elements is a nested dict. Consider the first dict in elements:

{"type": "node","id": 536694,"lat": 50.9849256,"lon": 13.6821776,"tags": {"highway": "bus_stop","name": "Niederhäslich Bergmannsweg"}
}

Should ['lat', 'lon', 'type', 'id', 'tags'] be the columns? That seems plausible, except that the tags column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.


import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232# Rechts oben
maxLat = 51.1390
maxLon = 13.89873osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:for key, val in dct['tags'].iteritems():dct[key] = valdel dct['tags']osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())

yields

         lat        lon                        name
0  50.984926  13.682178  Niederhäslich Bergmannsweg
1  51.123623  13.782789                Sagarder Weg
2  51.065752  13.895734     Weißig, Einkaufszentrum
3  51.007140  13.698498          Stuttgarter Straße
4  51.010199  13.701411          Heilbronner Straße
https://en.xdnf.cn/q/71719.html

Related Q&A

How to convert string labels to one-hot vectors in TensorFlow?

Im new to TensorFlow and would like to read a comma separated values (csv) file, containing 2 columns, column 1 the index, and column 2 a label string. I have the following code which reads lines in th…

Pandas dataframe boolean mask on multiple columns

I have a dataframe (df) containing several columns with an actual measure and corresponding number of columns (A,B,...) with an uncertainty (dA, dB, ...) for each of these columns:A B dA dB …

Which of these scripting languages is more appropriate for pen-testing? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.Clo…

Keras: Optimal epoch selection

Im trying to write some logic that selects the best epoch to run a neural network in Keras. My code saves the training loss and the test loss for a set number of epochs and then picks the best fitting …

error in loading pickle

Not able to load a pickle file. I am using python 3.5import pickle data=pickle.load(open("D:\\ud120-projects\\final_project\\final_project_dataset.pkl", "r"))TypeError: a bytes-lik…

How to test if a webpage is an image

Sorry that the title wasnt very clear, basically I have a list with a whole series of urls, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an i…

Generic detail view ProfileView must be called with either an object pk or a slug

Im new to Django 2.0 and im getting this error when visiting my profile page view. Its working with urls like path(users/<int:id>) but i wanted to urls be like path(<username>). Not sure wh…

Python Pandas group datetimes by hour and count row

This is my transaction dataframe, where each row mean a transaction :date station 30/10/2017 15:20 A 30/10/2017 15:45 A 31/10/2017 07:10 A 31/10/2017 07:25 B 31/10/2017 07:55 …

Get Bokehs selection in notebook

Id like to select some points on a plot (e.g. from box_select or lasso_select) and retrieve them in a Jupyter notebook for further data exploration. How can I do that?For instance, in the code below, …

Upload CSV file into Microsoft Azure storage account using python

I am trying to upload a .csv file into Microsoft Azure storage account using python. I have found C-sharp code to write a data to blob storage. But, I dont know C# language. I need to upload .csv file …