I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

2024/11/15 7:04:35

While trying to use to_gbq for updating Google BigQuery table, I get a response of:

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

My code:

gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

and my dataframe of mini_df looks like:

date    request_number  name    feature_name    value_name  value
2018-01-10  1   1   "a" "b" 0.309457
2018-01-10  1   1   "c" "d" 0.273748

While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:

date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE

What am I doing wrong? How can I solve this?

P.S, rest of the exception:

BadRequest                                Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)589                         destination_table,
--> 590                         job_config=job_config).result()591                 except self.http_error as ex:~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)527         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528         return super(_AsyncJob, self).result(timeout=timeout)529 ~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)110             # Pylint doesn't recognize that this is valid in this case.
--> 111             raise self._exception112 BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.During handling of the above exception, another exception occurred:GenericGBQException                       Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)106                       chunksize=chunksize,107                       verbose=verbose, reauth=reauth,
--> 108                       if_exists=if_exists, private_key=private_key)~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)987         table.create(table_id, table_schema)988 
--> 989     connector.load_data(dataframe, dataset_id, table_id, chunksize)990 991 ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)590                         job_config=job_config).result()591                 except self.http_error as ex:
--> 592                     self.process_http_error(ex)593 594                 rows = []~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)454         # <https://cloud.google.com/bigquery/troubleshooting-errors>`__455 
--> 456         raise GenericGBQException("Reason: {0}".format(ex))457 458     def run_query(self, query, **kwargs):GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
Answer

I've had the very same problem.

In my case it depended on the data type object of the Data Frame.

I've had three columns externalId, mappingId, info. For none of those fields I set a data type and let pandas do it's magic.

It decided to set all three column data types to object. Problem is, internally the to_gbq component uses the to_json component. For some reason or another this output omits the quotes around the data field if the type of the field is object but holds only numerical values.

So Google Big Query needed this

{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}

but got this:

{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}

And because the mapping of the field was STRING in Google Big Query, the import process failed.

Two solutions came up.

Solution 1 - Change the data type of the column

A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER.

df['externalId'] = df['externalId'].astype('int')

If this is the case, Big Query can consume fields without quotes as the JSON standard says.

Solution 2 - Make sure the string field is a string

Again, this is setting the data type. But since we set it explicitly to String, the export with to_json prints out a quoted field and everything worked fine.

df['externalId'] = df['externalId'].astype('str')
https://en.xdnf.cn/q/72013.html

Related Q&A

Something wrong with Keras code Q-learning OpenAI gym FrozenLake

Maybe my question will seem stupid.Im studying the Q-learning algorithm. In order to better understand it, Im trying to remake the Tenzorflow code of this FrozenLake example into the Keras code.My code…

How to generate month names as list in Python? [duplicate]

This question already has answers here:Get month name from number(18 answers)Closed 2 years ago.I have tried using this but the output is not as desired m = [] import calendar for i in range(1, 13):m.a…

Getting ERROR: Double requirement given: setuptools error in zappa

I tried to deploy my Flask app with zappa==0.52.0, but I get an error as below;ERROR: Double requirement given: setuptools (already in setuptools==52.0.0.post20210125, name=setuptools) WARNING: You are…

PySpark - Create DataFrame from Numpy Matrix

I have a numpy matrix:arr = np.array([[2,3], [2,8], [2,3],[4,5]])I need to create a PySpark Dataframe from arr. I can not manually input the values because the length/values of arr will be changing dyn…

RunTimeError during one hot encoding

I have a dataset where class values go from -2 to 2 by 1 step (i.e., -2,-1,0,1,2) and where 9 identifies the unlabelled data. Using one hot encode self._one_hot_encode(labels)I get the following error:…

Is there a Mercurial or Git version control plugin for PyScripter? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

How to make a color map with many unique colors in seaborn

I want to make a colormap with many (in the order of hundreds) unique colors. This code: custom_palette = sns.color_palette("Paired", 12) sns.palplot(custom_palette)returns a palplot with 12 …

Swap column values based on a condition in pandas

I would like to relocate columns by condition. In case country is Japan, I need to relocate last_name and first_name reverse.df = pd.DataFrame([[France,Kylian, Mbappe],[Japan,Hiroyuki, Tajima],[Japan,…

How to improve performance on a lambda function on a massive dataframe

I have a df with over hundreds of millions of rows.latitude longitude time VAL 0 -39.20000076293945312500 140.80000305175781250000 1…

How to detect if text is rotated 180 degrees or flipped upside down

I am working on a text recognition project. There is a chance the text is rotated 180 degrees. I have tried tesseract-ocr on terminal, but no luck. Is there any way to detect it and correct it? An exa…