namespace error lxml xpath python

2024/9/23 1:28:45

I am transforming word documents to xml to compare them using the following code:

word = win32com.client.Dispatch('Word.Application')
wd = word.Documents.Open(inFile)
# Converts the word infile to xml outfile
wd.SaveAs(outFile,11)
wd.Close()
dom=parse(outFile)

The xml file I get looks like:

<?xml version="1.0" encoding="utf-8"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument w:embeddedObjPresent="no" w:macrosPresent="no" w:ocxPresent="no" xml:space="preserve" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"><w:ignoreSubtree w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"/><w:shapeDefaults><o:shapedefaults spidmax="1027" v:ext="edit"/><o:shapelayout v:ext="edit"><o:idmap data="1" v:ext="edit"/></o:shapelayout></w:shapeDefaults><w:body><wx:sect><w:tbl><w:tblGrid><w:gridCol w:w="200"/>...</w:tblGrid><w:pict><v:shapetype coordsize="21600,21600" filled="f" id="_x0000_t75" o:preferrelative="t" o:spt="75" path="m@4@5l@4@11@9@11@9@5xe" stroked="f"><v:stroke joinstyle="miter"/><v:formulas><v:f eqn="if lineDrawn pixelLineWidth 0"/>...</v:formulas><v:path gradientshapeok="t" o:connecttype="rect" o:extrusionok="f"/><o:lock aspectratio="t" v:ext="edit"/></v:shapetype><v:shape id="Picture" o:spid="_x0000_s1026" style="position:absolute;left:0;text-align:left;margin-left:0;margin-top:0;width:400pt;height:40pt;z-index:1;visibility:visible;mso-wrap-style:square;mso-wrap-distance-left:0;mso-wrap-distance-top:0;mso-wrap-distance-right:0;mso-wrap-distance-bottom:0;mso-position-horizontal:left;mso-position-horizontal-relative:text;mso-position-vertical:absolute;mso-position-vertical-relative:line" type="#_x0000_t75"><v:imagedata o:title="" src="wordml://03000001.png"/><w10:wrap anchory="line"/><w10:anchorlock/></v:shape></w:pict> ...

I can't use xpath function (lxml library) when I try for example :

import lxml.etree as et
tree = et.parse(xmlFile)
for elt in tree.xpath("//w:gridCol"):elt.getparent().remove(elt)

I get the following error:

 for elt in tree.xpath("//w:gridCol"):File "lxml.etree.pyx", line 2029, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:45934)File "xpath.pxi", line 379, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:114389)File "xpath.pxi", line 242, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:113063)File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:112894)
XPathEvalError: Undefined namespace prefix

I did some research and I guess it's a namespace matter, but I don't know how to fix it?

Answer

In this code:

for elt in tree.xpath("//w:gridCol"):

w: isn't a namespace; it's a namespace prefix which is effectively shorthand for the actual namespace, http://schemas.microsoft.com/office/word/2003/wordml. If you want to search for elements in this namespace using the xpath method, you need to provide it with a mapping of namespace prefixes to namespaces:

tree.xpath("//w:gridCol", namespaces={'w': 'http://schemas.microsoft.com/office/word/2003/wordml',})

Also, note that there is no requirement that you use the same namespace prefix. The following would find the same elements:

tree.xpath("//bob:gridCol", namespaces={'bob': 'http://schemas.microsoft.com/office/word/2003/wordml'})
https://en.xdnf.cn/q/71879.html

Related Q&A

lark grammar: How does the escaped string regex work?

The lark parser predefines some common terminals, including a string. It is defined as follows:_STRING_INNER: /.*?/ _STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/ ESCAPED_STRING : "\&quo…

Pycharm unresolved reference on join of os.path

After upgrade pycharm to 2018.1, and upgrade python to 3.6.5, pycharm reports "unresolved reference join". The last version of pycharm doesnt show any warning for the line below:from os.path …

Apply Border To Range Of Cells Using Openpyxl

I am using python 2.7.10 and openpyxl 2.3.2 and I am a Python newbie.I am attempting to apply a border to a specified range of cells in an Excel worksheet (e.g. C3:H10). My attempt below is failing wit…

Make a functional field editable in Openerp?

How to make functional field editable in Openerp?When we createcapname: fields.function(_convert_capital, string=Display Name, type=char, store=True ),This will be displayed has read-only and we cant …

how to read a fasta file in python?

Im trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header…

Passing a pandas dataframe column to an NLTK tokenizer

I have a pandas dataframe raw_df with 2 columns, ID and sentences. I need to convert each sentence to a string. The code below produces no errors and says datatype of rule is "object." raw_d…

SWIG - Wrap C string array to python list

I was wondering what is the correct way to wrap an array of strings in C to a Python list using SWIG.The array is inside a struct :typedef struct {char** my_array;char* some_string; }Foo;SWIG automati…

How to show an Image with pillow and update it?

I want to show an image recreated from an img-vector, everything fine. now I edit the Vector and want to show the new image, and that multiple times per second. My actual code open tons of windows, wit…

How do I map Alt Gr key combinations in vim?

Suppose I wanted to map the command :!python % <ENTER> to pressing the keys Alt Gr and j together?

cannot import name get_user_model

I use django-registrations and while I add this code in my admin.pyfrom django.contrib import adminfrom customer.models import Customerfrom .models import UserProfilefrom django.contrib.auth.admin impo…