Does pybtex support accent/special characters in .bib file?

2024/9/17 4:05:02
from pybtex.database.input import bibtex
parser = bibtex.Parser()
bibdata = parser.parse_file("sample.bib")

The above code snippet works really well in parsing a .bib file but it seems not to support accent characters, like {\"u} or \"{u}(From LaTeX). Just like to confirm if pybtex support that or not.

For example, according to LaTeX/Special Characters and How to write “ä” and other umlauts and accented letters in bibliography?, \"{o} should convert to ö, and so does {\"o}.

Answer

Update: this feature is now supported by pybtex since version 0.20.

It does not at the moment. But you can read the bib file using a latex codec before you process it with pybtex, e.g. with https://pypi.python.org/pypi/latexcodec/ This codec will convert (a wide range of) LaTeX commands to unicode for you.

However, you'll have to remove brackets in a post-processing stage. Why? In order to handle bibtex code gracefully, \"{U} has to be converted into {Ü} rather than into Ü to prevent it from being lower cased in titles. The following example demonstrates this behaviour:

import pybtex.database.input.bibtex
import pybtex.plugin
import codecs
import latexcodecstyle = pybtex.plugin.find_plugin('pybtex.style.formatting', 'plain')()
backend = pybtex.plugin.find_plugin('pybtex.backends', 'latex')()
parser = pybtex.database.input.bibtex.Parser()
with codecs.open("test.bib", encoding="latex") as stream:# this shows what the latexcodec does to the sourceprint stream.read()
with codecs.open("test.bib", encoding="latex") as stream:data = parser.parse_stream(stream)
for entry in style.format_entries(data.entries.itervalues()):print entry.text.render(backend)

where test.bib is

@Article{test,author =       {John Doe},title =        {Testing \"UTEST \"{U}TEST},journal =      {Journal of Test},year =         {2000},
}

This will print how the latexcodec converted test.bib into unicode (edited for readability):

@Article{test,author = {John Doe}, title = {Testing ÜTEST {Ü}TEST},journal = {Journal of Test}, year = {2000},
}

followed by the pybtex rendered entry (in this case, producing latex code):

John Doe.
\newblock Testing ütest {Ü}test.
\newblock \emph{Journal of Test}, 2000.

If the codec were to strip the brackets, pybtex would have converted the case wrongly. Further, in (pathological) cases like journal = {\"u} clearly the brackets cannot be removed either.

An obvious downside is that if you render to a non-LaTeX backend, then you have to remove the brackets in a post-processing stage. But you may want to do that anyway to process any special LaTeX commands (such as \url). It would be nice if pybtex could somehow do that for you, but it doesn't at the moment.

https://en.xdnf.cn/q/72560.html

Related Q&A

How do I count specific values across multiple columns in pandas

I have the DataFrame df = pd.DataFrame({colA:[?,2,3,4,?],colB:[1,2,?,3,4],colC:[?,2,3,4,5] })I would like to get the count the the number of ? in each column and return the following output - colA…

Split Python source into separate directories?

Here are some various Python packages my company "foo.com" uses:com.foo.bar.web com.foo.bar.lib com.foo.zig.web com.foo.zig.lib com.foo.zig.lib.lib1 com.foo.zig.lib.lib2Heres the traditional …

How can I use a raw_input with twisted?

I am aware that raw_input cannot be used in twisted. However here is my desired application.I have an piece of hardware that provides an interactive terminal serial port. I am trying to connect to th…

How to use Python and HTML to build a desktop software?

Maybe my question is stupid but I still want to ask. I am always wondering whether I can use Python, HTML and Css to develop a desktop software. I know there are alrealy several good GUI frameworks lik…

More efficient way to look up dictionary values whose keys start with same prefix

I have a dictionary whose keys come in sets that share the same prefix, like this:d = { "key1":"valA", "key123":"valB", "key1XY":"valC","…

When should I use dt.column vs dt[column] pandas?

I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.d[c3] = d.c1 / d. c2 d[c3] = d[c1] / d[c2]I was wondering wheth…

Quiver matplotlib : arrow with the same sizes

Im trying to do a plot with quiver but I would like the arrows to all have the same size.I use the following input :q = ax0.quiver(x, y, dx, dy, units=xy ,scale=1) But even if add options like norm = t…

How to convert Tensorflow dataset to 2D numpy array

I have a TensorFlow dataset which contains nearly 15000 multicolored images with 168*84 resolution and a label for each image. Its type and shape are like this: < ConcatenateDataset shapes: ((168, 8…

CSV remove field value wrap quotes

Im attempting to write a list to a csv, however when I do so I get wrapper quotes around my field values:number1,number2 "1234,2345" "1235.7890" "2345.5687"Using this code…

Python - Py_Initialize unresolved during compilation

I have statically compiled Python2.7 without any error. To test my build, I use the following snippet: #include "Python.h" int main() {Py_Initialize(); }And I am compiling it like this:$ gcc…