Example csv file

Question 1

I need to convert a csv file to rdf with rdflib, I already have the code that reads the csv but I do not know how to convert it to rdf.

I have the following code:

import csv
from rdflib.graph import Graph# Open the input file
with open('data.csv', 'rb') as fcsv:g = Graph()csvreader = csv.reader(fcsv)y = Truefor row in csvreader:if y:names = rowy = Falseelse:for i in range(len(row)):continueprint(g.serialize(format='xml'))fcsv.close()

Can someone explain and give me an example?

Question 2

Example csv file

With courtesy of KRontheWeb, I use the following example csv file to answer your question: https://github.com/KRontheWeb/csv2rdf-tutorial/blob/master/example.csv

"Name";"Address";"Place";"Country";"Age";"Hobby";"Favourite Colour" 
"John";"Dam 52";"Amsterdam";"The Netherlands";"32";"Fishing";"Blue"
"Jenny";"Leidseplein 2";"Amsterdam";"The Netherlands";"12";"Dancing";"Mauve"
"Jill";"52W Street 5";"Amsterdam";"United States of America";"28";"Carpentry";"Cyan"
"Jake";"12E Street 98";"Amsterdam";"United States of America";"42";"Ballet";"Purple"

Import Libraries

import pandas as pd #for handling csv and csv contents
from rdflib import Graph, Literal, RDF, URIRef, Namespace #basic RDF handling
from rdflib.namespace import FOAF , XSD #most common namespaces
import urllib.parse #for parsing strings to URI's

Read in the csv file

url='https://raw.githubusercontent.com/KRontheWeb/csv2rdf-tutorial/master/example.csv'
df=pd.read_csv(url,sep=";",quotechar='"')
# df # uncomment to check for contents

Define a graph 'g' and namespaces

g = Graph()
ppl = Namespace('http://example.org/people/')
loc = Namespace('http://mylocations.org/addresses/')
schema = Namespace('http://schema.org/')

Create the triples and add them to graph 'g'

It's a bit dense, but each g.add() consists of three parts: subject, predicate, object. For more info, check the really friendly rdflib documentation, section 1.1.3 onwards at https://buildmedia.readthedocs.org/media/pdf/rdflib/latest/rdflib.pdf

for index, row in df.iterrows():g.add((URIRef(ppl+row['Name']), RDF.type, FOAF.Person))g.add((URIRef(ppl+row['Name']), URIRef(schema+'name'), Literal(row['Name'], datatype=XSD.string) ))g.add((URIRef(ppl+row['Name']), FOAF.age, Literal(row['Age'], datatype=XSD.integer) ))g.add((URIRef(ppl+row['Name']), URIRef(schema+'address'), Literal(row['Address'], datatype=XSD.string) ))g.add((URIRef(loc+urllib.parse.quote(row['Address'])), URIRef(schema+'name'), Literal(row['Address'], datatype=XSD.string) ))

Note that:

I borrow namespaces from rdflib and create some myself;
It is good practice to define the datatype whenever you can;
I create URI's from the addresses (example of string handling).

Check the results

print(g.serialize(format='turtle').decode('UTF-8'))

A snippet of the output:

<http://example.org/people/Jake> a ns2:Person ;ns1:address "12E Street 98"^^xsd:string ;ns1:name "Jake"^^xsd:string ;ns2:age 42 .

Save the results to disk

g.serialize('mycsv2rdf.ttl',format='turtle')

Creating RDF file using csv file as input

Example csv file

Import Libraries

Read in the csv file

Define a graph 'g' and namespaces

Create the triples and add them to graph 'g'

Check the results

Save the results to disk

Related Q&A

Explicit vertex position in python graph-tool

Add jar to pyspark when using notebook

how do I create a python list with a negative index

Select subset of Data Frame rows based on a list in Pandas

convert csv to json (nested objects)

How can I read exactly one response chunk with pythons http.client?

ValueError: cannot reindex from a duplicate axis in groupby Pandas

How to calculate class weights of a Pandas DataFrame for Keras?

How to change the layout of a Gtk application on fullscreen?

How to upload multiple file in django admin models