Convert CSV to YAML, with Unicode?

2024/9/29 11:40:09

I'm trying to convert a CSV file, containing Unicode strings, to a YAML file using Python 3.4.

Currently, the YAML parser escapes my Unicode text into an ASCII string. I want the YAML parser to export the Unicode string as a Unicode string, without the escape characters. I'm misunderstanding something here, of course, and I'd appreciate any assistance.

Bonus points: how might this be done with Python 2.7?

CSV input

id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название

current YAML output

- id: 1title_english: A Title in Englishtitle_russian: "\u041D\u0430\u0437\u0432\u0430\u043D\u0438\u0435 \u043D\u0430\\ \u0440\u0443\u0441\u0441\u043A\u043E\u043C"
- id: 2title_english: Another Titletitle_russian: "\u0414\u0440\u0443\u0433\u043E\u0439 \u041D\u0430\u0437\u0432\u0430\\u043D\u0438\u0435"

desired YAML output

- id: 1title_english: A Title in Englishtitle_russian: Название на русском
- id: 2title_english: Another Titletitle_russian: Другой Название

Python conversion code

import csv
import yaml
in_file  = open('csv_file.csv', "r")
out_file = open('yaml_file.yaml', "w")
items = []def convert_to_yaml(line, counter):item = {'id': counter,'title_english': line[0],'title_russian': line[1]}items.append(item)try:reader = csv.reader(in_file)next(reader) # skip headersfor counter, line in enumerate(reader):convert_to_yaml(line, counter)out_file.write( yaml.dump(items, default_flow_style=False) )finally:in_file.close()out_file.close()

Thanks!

Answer

I ran into the same issue and this was how I was able to resolve it based on your example above

out_file.write(yaml.dump(items, default_flow_style=False,allow_unicode=True) )

including allow_unicode=True fixes the issue.

also specifically for python2 make use of safe_dump instead of dump to prevent the !!python/unicode displaying along with the unicode text.

out_file.write(yaml.safe_dump(items, default_row_style=False,allow_unicode=True)
https://en.xdnf.cn/q/71213.html

Related Q&A

Why is the divide and conquer method of computing factorials so fast for large ints? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, argum…

Python calculate speed, distance, direction from 2 GPS coordinates

How do I calculate the speed, distance and direction (degrees) from 2 GPS coordinates in Python? Each point has lat, long, time.I found the Haversine distance calculation in this post:Calculate dista…

Installed gunicorn but it is not in venv/bin folder

Im new to gunicorn and trying to deploy a django website on an ubuntu. I have used: pip3 install gunicorn sudo apt-get install gunicornbut when I want to fill this file:sudo nano /etc/systemd/system/g…

Does Pythons asyncio lock.acquire maintain order?

If I have two functions doingasync with mylock.acquire():....Once the lock is released, is it guaranteed that the first to await will win, or is the order selected differently? (e.g. randomly, arbitra…

Howto ignore specific undefined variables in Pydev Eclipse

Im writing a crossplatform python script on windows using Eclipse with the Pydev plugin. The script makes use of the os.symlink() and os.readlink() methods if the current platform isnt NT. Since the os…

Faster way to calculate hexagon grid coordinates

Im using the following procedure to calculate hexagonal polygon coordinates of a given radius for a square grid of a given extent (lower left upper right):def calc_polygons(startx, starty, endx, endy,…

Why is -0.0 not the same as 0.0?

I could be missing something fundamental, but consider this interpreter session1:>>> -0.0 is 0.0 False >>> 0.0 is 0.0 True >>> -0.0 # The sign is even retained in the output…

scrapy: exceptions.AttributeError: unicode object has no attribute dont_filter

In scrapy, I am getting the error exceptions.AttributeError: unicode object has no attribute dont_filter. After searching around, I found this answer (which made sense as it was the only bit of code I …

Django Class Based View: Validate object in dispatch

Is there a established way that i validate an object in the dispatch without making an extra database call when self.get_object() is called later in get/post?Here is what i have so far (slightly alter…

Python very slow as compared to Java for this algorithm

Im studying algorithms and decided to port the Java Programs from the textbook to Python, since I dislike the Java overhead, especially for small programs, and as an exercise.The algorithm itself is ve…