Is there a way to load a JSON file from local file system to BigQuery using Google BigQuery Client API?
All the options I found are:
1- Streaming the records one by one.
2- Loading JSON data from GCS.
3- Using raw POST requests to load the JSON (i.e. not through Google Client API).
I'm assuming from the python tag that you want to do this from python. There is a load example here that loads data from a local file (it uses CSV, but it is easy to adapt it to JSON... there is another json example in the same directory).
The basic flow is:
# Load configuration with the destination specified.
load_config = {'destinationTable': {'projectId': PROJECT_ID,'datasetId': DATASET_ID,'tableId': TABLE_ID}
}load_config['schema'] = {'fields': [{'name':'string_f', 'type':'STRING'},{'name':'boolean_f', 'type':'BOOLEAN'},{'name':'integer_f', 'type':'INTEGER'},{'name':'float_f', 'type':'FLOAT'},{'name':'timestamp_f', 'type':'TIMESTAMP'}]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'# This tells it to perform a resumable upload of a local file
# called 'foo.json'
upload = MediaFileUpload('foo.json',mimetype='application/octet-stream',# This enables resumable uploads.resumable=True)start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(projectId=project_id,body={'jobReference': {'jobId': job_id},'configuration': {'load': load}},media_body=upload).execute()# Then you'd also want to wait for the result and check the status. (check out# the example at the link for more info).