Question 1

There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven't seen any solutions to this problem:

I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like "5E12" (clone numbers, if anyone cares) that appear to display correctly, but there's a little green arrow next to each one warning me that "This appears to be a number stored as text". Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I'm only being warned/offered to convert it.

So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.

How can I prevent this from happening? I do not want text that look like "numbers stored as text" to convert to numbers. They are text unless I say so.

So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it's manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don't always know where this problem might occur (I'm reading millions of lines per day, so I don't want to have to do anything by hand).

I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it's too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk

So, any suggestions?

Question 2

If you want to use openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:

diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py

--- a/openpyxl/reader/worksheet.py
+++ b/openpyxl/reader/worksheet.py
@@ -134,8 +134,10 @@data_type = element.get('t', 'n')if data_type == Cell.TYPE_STRING:value = string_table.get(int(value))
-
-            ws.cell(coordinate).value = value
+                ws.cell(coordinate).set_value_explicit(value=value,
+                                                data_type=Cell.TYPE_STRING)
+            else:
+                ws.cell(coordinate).value = value# to avoid memory exhaustion, clear the item after useelement.clear()

The Cell.value is a property and on assignment call Cell._set_value, which then does a Cell.bind_value which according to the method's doc: "Given a value, infer type and display options". As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something 'smart'.

As you can see from the code, the test whether it is a string was already there.

openpyxl please do not assume text as a number when importing

Related Q&A

NLTK CoreNLPDependencyParser: Failed to establish connection

How to convert hex string to color image in python?

How to add values to a new column in pandas dataframe?

value error happens when using GridSearchCV

How to remove english text from arabic string in python?

python module pandas has no attribute plotting

pandass resample with fill_method: Need to know data from which row was copied?

Inefficient multiprocessing of numpy-based calculations

SQLite: return only top 2 results within each group

Python list.append if not in list vs set.add performance [duplicate]