Creating RDF file using csv file as input

2024/9/22 10:33:29

I need to convert a csv file to rdf with rdflib, I already have the code that reads the csv but I do not know how to convert it to rdf.

I have the following code:

import csv
from rdflib.graph import Graph# Open the input file
with open('data.csv', 'rb') as fcsv:g = Graph()csvreader = csv.reader(fcsv)y = Truefor row in csvreader:if y:names = rowy = Falseelse:for i in range(len(row)):continueprint(g.serialize(format='xml'))fcsv.close()

Can someone explain and give me an example?

Answer

Example csv file

With courtesy of KRontheWeb, I use the following example csv file to answer your question: https://github.com/KRontheWeb/csv2rdf-tutorial/blob/master/example.csv

"Name";"Address";"Place";"Country";"Age";"Hobby";"Favourite Colour" 
"John";"Dam 52";"Amsterdam";"The Netherlands";"32";"Fishing";"Blue"
"Jenny";"Leidseplein 2";"Amsterdam";"The Netherlands";"12";"Dancing";"Mauve"
"Jill";"52W Street 5";"Amsterdam";"United States of America";"28";"Carpentry";"Cyan"
"Jake";"12E Street 98";"Amsterdam";"United States of America";"42";"Ballet";"Purple"

Import Libraries

import pandas as pd #for handling csv and csv contents
from rdflib import Graph, Literal, RDF, URIRef, Namespace #basic RDF handling
from rdflib.namespace import FOAF , XSD #most common namespaces
import urllib.parse #for parsing strings to URI's

Read in the csv file

url='https://raw.githubusercontent.com/KRontheWeb/csv2rdf-tutorial/master/example.csv'
df=pd.read_csv(url,sep=";",quotechar='"')
# df # uncomment to check for contents

Define a graph 'g' and namespaces

g = Graph()
ppl = Namespace('http://example.org/people/')
loc = Namespace('http://mylocations.org/addresses/')
schema = Namespace('http://schema.org/')

Create the triples and add them to graph 'g'

It's a bit dense, but each g.add() consists of three parts: subject, predicate, object. For more info, check the really friendly rdflib documentation, section 1.1.3 onwards at https://buildmedia.readthedocs.org/media/pdf/rdflib/latest/rdflib.pdf

for index, row in df.iterrows():g.add((URIRef(ppl+row['Name']), RDF.type, FOAF.Person))g.add((URIRef(ppl+row['Name']), URIRef(schema+'name'), Literal(row['Name'], datatype=XSD.string) ))g.add((URIRef(ppl+row['Name']), FOAF.age, Literal(row['Age'], datatype=XSD.integer) ))g.add((URIRef(ppl+row['Name']), URIRef(schema+'address'), Literal(row['Address'], datatype=XSD.string) ))g.add((URIRef(loc+urllib.parse.quote(row['Address'])), URIRef(schema+'name'), Literal(row['Address'], datatype=XSD.string) ))

Note that:

  • I borrow namespaces from rdflib and create some myself;
  • It is good practice to define the datatype whenever you can;
  • I create URI's from the addresses (example of string handling).

Check the results

print(g.serialize(format='turtle').decode('UTF-8'))

A snippet of the output:

<http://example.org/people/Jake> a ns2:Person ;ns1:address "12E Street 98"^^xsd:string ;ns1:name "Jake"^^xsd:string ;ns2:age 42 .

Save the results to disk

g.serialize('mycsv2rdf.ttl',format='turtle')
https://en.xdnf.cn/q/71958.html

Related Q&A

Explicit vertex position in python graph-tool

I am using python graph-tool. To draw graphs, it uses graph_draw function. I want to send vertex positions explicitly to dot engine. It turns out that I can pass a property map named pos. I tried defin…

Add jar to pyspark when using notebook

Im trying the mongodb hadoop integration with spark but cant figure out how to make the jars accessible to an IPython notebook.Here what Im trying to do:# set up parameters for reading from MongoDB via…

how do I create a python list with a negative index

Im new to python and need to create a list with a negative index but have not been successful so far.Im using this code:a = [] for i in xrange( -20, 0, -1 ):a[i] = -(i)log.info(a[{i}]={v}.format(i=i, v…

Select subset of Data Frame rows based on a list in Pandas

I have a data frame df1 and list x:In [22] : import pandas as pd In [23]: df1 = pd.DataFrame({C: range(5), "B":range(10,20,2), "A":list(abcde)}) In [24]: df1 Out[24]:A B C 0 a …

convert csv to json (nested objects)

I am new to python, and I am having to convert a csv file to json in following format:CSV File :firstname, lastname, email, customerid, dateadded, customerstatus john, doe, [email protected], 124,26/11…

How can I read exactly one response chunk with pythons http.client?

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?Im extending an existing test fixture (writt…

ValueError: cannot reindex from a duplicate axis in groupby Pandas

My dataframe looks like this:SKU # GRP CATG PRD 0 54995 9404000 4040 99999 1 54999 9404000 4040 99999 2 55037 9404000 4040 1556894 3 55148 9404000 4040 1556894 4 55254 94…

How to calculate class weights of a Pandas DataFrame for Keras?

Im tryingprint(Y) print(Y.shape)class_weights = compute_class_weight(balanced,np.unique(Y),Y) print(class_weights)But this gives me an error:ValueError: classes should include all valid labels that can…

How to change the layout of a Gtk application on fullscreen?

Im developing another image viewer using Python and Gtk and the viewer is currently very simple: it consists of a GtkWindow with a GtkTreeView on the left side displaying the list of my images, and a G…

How to upload multiple file in django admin models

file = models.FileField(upload_to=settings.FILE_PATH)For uploading a file in django models I used the above line. But For uploading multiple file through django admin model what should I do? I found t…