recover all line from an attribute in a database in json

2024/11/16 13:55:46

To simplify my problem, I have a base in json, and I recover all of my lines of json to put informations in a base. It seems easy for moments, but problem is that my json is not correctly written

So i did a code to recover all my json lines, but it doesn't work on all lines, like "biographie".

I show you

{"name": "Nazamiu0304 Rau0304majiu0304", "personal_name": "Nazamiu0304 Rau0304majiu0304", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:00:41.270799"}, "key": "/authors/OL1001461A", "type": {"key": "/type/author"}, "revision": 2}
{"name": "Nazamiu0304 Rau0304majiu0304", "personal_name": "Nazamiu0304 Rau0304majiu0304", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:00:41.270799"}, "key": "/authors/OL1001461A", "type": {"key": "/type/author"}, "revision": 2}

you see, you have name,personal name ...

sometimes you have other informations,

{"bio": {"type": "/type/text", "value": "> "Eversley, William Pinder, B.C.L. Queen's Coll., Oxon, M.A., a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, E.C." rn> ...[in Foster's _Men at the Bar_][1]rnrnrn  rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""}, "name": "William Pinder Eversley", "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "death_date": "1918", "photos": [6897255, 6897254], "last_modified": {"type": "/type/datetime", "value": "2018-07-31T15:39:07.982159"}, "latest_revision": 6, "key": "/authors/OL1003081A", "birth_date": "1851", "personal_name": "William Pinder Eversley", "type": {"key": "/type/author"}, "revision": 6}{"name": "Valerie Meyer", "personal_name": "Valerie Meyer", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:22:33.63997"}, "key": "/authors/OL1004062A", "type": {"key": "/type/author"}, "revision": 2}

You can see i have a lot of problem with the element "bio": because he is not written correctely at all, the quota are not interpreted correctely and "<" too so I got this code to change the structure of bio to exploit it.

Here my code to change structure of bio

import re
import json
import pprintbio_regex = re.compile(r"""
("bio":\s*{)   # bio field start
(.*?)          # content
(},)           # bio field end
(?=\s*(?:"\w+"|}))  # followed by another one or the json end
""",flags=re.VERBOSE | re.DOTALL)value_regex = re.compile(r"""
("value":\s*")   # value field start
(.*?)            # content
("\s*\Z)         # value field end + end of string
""",flags=re.VERBOSE | re.DOTALL)def normalize_value(mo):start, content, end = mo.group(1, 2, 3)content = content.replace('"', '\\"')return start + content + enddef normalize_bio(mo):start, content, end = mo.group(1, 2, 3)content = value_regex.sub(normalize_value, content)return start + content + endmessy_json = """
{ "bio":{ "type":"/type/text","value":"> "Eversley, William Pinder, B.C.L. Queen's Coll., Oxon, M.A., a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, E.C." rn> ...[in Foster's Men at the Bar][1]rnrnrn rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""},"name":"William Pinder Eversley","created":{ "type":"/type/datetime","value":"2008-04-01T03:28:50.625462"},"death_date":"1918","photos":[ 6897255,6897254],"last_modified":{ "type":"/type/datetime","value":"2018-07-31T15:39:07.982159"},"latest_revision":6,"key":"/authors/OL1003081A","birth_date":"1851","personal_name":"William Pinder Eversley","type":{ "key":"/type/author"},"revision":6
}"""result = bio_regex.sub(normalize_bio, messy_json)
obj = json.loads(result)

Here is the result:


{'bio': {'type': '/type/text','value': '> "Eversley, William Pinder, B.C.L. Queen\'s Coll., Oxon, M.A., a member of the '"South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of "'the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest '"son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench "'Walk, Temple, E.C." rn> ...[in Foster\'s Men at the Bar][1]rnrnrn rnrn[1]: ''https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar"'},'birth_date': '1851','created': {'type': '/type/datetime', 'value': '2008-04-01T03:28:50.625462'},'death_date': '1918','key': '/authors/OL1003081A','last_modified': {'type': '/type/datetime', 'value': '2018-07-31T15:39:07.982159'},'latest_revision': 6,'name': 'William Pinder Eversley','personal_name': 'William Pinder Eversley','photos': [6897255, 6897254],'revision': 6,'type': {'key': '/type/author'}}

The problem here is that this script is good if I put my entire line in my code, but i would like to recover my 1000000 lines of bio with the good structure, I can't do that 1 per 1, I tried a lot of thing with a loop to recover 1 per 1 but it puts me always an error, I need know how recuperate it witch a loop . I need upgrade my code to take all lines of database from line bio and not only 1 per 1

Thanks in advance and thanks to listen me!

Answer

For exemple, I wanted say that : I have a file , openlibraryjson.json :

with these lines :

{"name": "Ismail Ibrahim Dr.", "title": "Dr.", "personal_name": "Ismail Ibrahim", "last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "key": "/authors/OL100304A", "type": {"key": "/type/author"}, "revision": 1} {"bio": {"type": "/type/text", "value": "> "Eversley, William Pinder, B.C.L. Queen's Coll., Oxon, M.A., a member of the South-eastern circuit, reporter for Law Times in Queen's Bench division, a student of the Inner Temple 14 April, 1874 (then aged 23), called to the bar 25 April, 1877 (eldest son of William Eversley, Esq., of London); born u2060, 1851. rn> rn> 7, King's Bench Walk, Temple, E.C." rn> ...[in Foster's Men at the Bar][1]rnrnrn rnrn[1]: https://en.wikisource.org/wiki/Men-at-the-Bar/Eversley,_William_Pinder "Men at the Bar""}, "name": "William Pinder Eversley", "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "death_date": "1918", "photos": [6897255, 6897254], "last_modified": {"type": "/type/datetime", "value": "2018-07-31T15:39:07.982159"}, "latest_revision": 6, "key": "/authors/OL1003081A", "birth_date": "1851", "personal_name": "William Pinder Eversley", "type": {"key": "/type/author"}, "revision": 6} {"name": "Valerie Meyer", "personal_name": "Valerie Meyer", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T18:22:33.63997"}, "key": "/authors/OL1004062A", "type": {"key": "/type/author"}, "revision": 2} {"bio": {"type": "/type/text", "value": "[Deutsch] Deutscher Orientalist und Theologe.rn[English] German orientalist and biblical scholar."}, "name": "August Dillmann", "links": [{"url": "http://de.wikipedia.org/wiki/August_Dillmann", "type": {"key": "/type/link"}, "title": "Wikipedia (Deutsch)"}, {"url": "http://en.wikipedia.org/wiki/August_Dillmann", "type": {"key": "/type/link"}, "title": "Wikipedia (English)"}], "personal_name": "August Dillmann", "death_date": "4 July 1894.", "alternate_names": ["Christian Friedrich August Dillmann", "Ch. F. A. Dillmann", "Friedrich August Dillmann", "F. A. Dillmann", "Augustus Dillmann", "August Dillmann", "A. Dillmann"], "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "photos": [6676274], "last_modified": {"type": "/type/datetime", "value": "2017-03-31T12:45:57.925108"}, "latest_revision": 8, "key": "/authors/OL1179559A", "birth_date": "25 April 1823", "revision": 8, "type": {"key": "/type/author"}, "remote_ids": {"viaf": "45046685", "wikidata": "Q75216"}} {"last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "type": {"key": "/type/author"}, "name": "Physikertagung (1966 Munich, Germany)", "key": "/authors/OL1179696A", "revision": 1}

I would like take only line bio and treat them to put them functionnal, for it I tried to open my file, and I Treat name,personal_name ... with a loop, it works but not for bio because it's not written correctly, so I skip bio in my script for moment But now i would like to don't skip bio and work with bio in the same way that for name,personal_name ...

Like it :

with open('openlibrary(3).json') as file:for i in range(101):line = file.readline()if "bio" in line:line.replace("\'", "’")continuecontent_json = json.loads(line)if not "personal_name" in line:#print('NULL')ligne.append("NULL")continuetry:#print(content_json['name'])ligne.append(content_json['personal_name'])except IndexError:print('NULL')if not "personal_name" in line:# print('NULL')personal_nom.append("NULL")continuetry:# print(content_json['name'])personal_nom.append(content_json['personal_name'])except IndexError:print('NULL')

I just put some code here to show a bit what i did for name,personal_name...

Thanks you again for listen me and aswer me !!!!

https://en.xdnf.cn/q/119523.html

Related Q&A

Calculate eigen value in python as same way(order) in Matlab

This is the Matlab code which is returning eigenvector in V and eigenvalue in D. Consider C is 9*9 matrix then V is 9*9 matrix and D is 9*9 diagonal. matrix.[V,D] = eig(C);I want the same thing in Pyth…

Python: ctypes and Pointer to Structure

I am trying to make a pointer of a struct and then de-reference it. But its crashing. I have mimiced the behvior here with this simple code. from ctypes import * import ctypesclass File(Structure):_fie…

Python:Why readline() function doesnt work for file looping

I have the following code:#!/usr/bin/pythonf = open(file,r)for line in f:print line print Next line , f.readline() f.close()This gives the following output:This is the first lineNext line That was the …

How to replace cropped rectangle in opencv?

I have managed to cropped a bounding box with text, e.g. given this image:Im able to exact the following box:with this code: import re import shutilfrom IPython.display import Imageimport requests impo…

How can I read hexadecimal data with python?

I have this c# app that Im trying to cooperate with a app written in python. The c# app send simple commands to the python app, for instance my c# app is sending the following:[Flags]public enum GameRo…

Want to scrape all the specific href from the a tag

I have search the specific brand Samsung , for this number of products are search ,I just wanted to scrape all the href from the of the search products with the product name . enter code here import u…

Encryption code in def function to be written in python

need some help in the following code as it goes into infinite loop and does not validate user input: the get_offset is the function. Just edited need some help with the encryption part to be done in a …

Creating xml from MySQL query with Python and lxml

I am trying to use Python and LXML to create an XML file from a Mysql query result. Here is the format I want.<DATA><ROW><FIELD1>content</FIELD1><FIELD2>content</FIELD2…

How to add another iterator to nested loop in python without additional loop?

I am trying to add a date to my nested loop without creating another loop. End is my list of dates and end(len) is equal to len(year). Alternatively I can add the date to the dataframe (data1) is that …

How to know where the arrow ends in matplotlib quiver

I have programmed plt.quiver(x,y,u,v,color), where there are arrows that start at x,y and the direction is determined by u,v. My question is how can I know exactly where the arrow ends?