Google Cloud Storage: __init__() got an unexpected keyword argument total_size

2024/10/6 22:26:45

I am developping a tool to transcribe interviews for a contract I have. For that I develop a code with the following flow:

  1. After input validation, the audio file (in m4a) is converted to wav and stored locally.
  2. Then, the WAV file is sent to my Google Cloud Storage bucket.
  3. Finally, the WAV file inside GCS is transcribed.

For the moment, step 1 works fine and step 3 is yet to be verified. I am currently having problems with the second part of the flow, as apparently google.resumable_media doesn't recognize some of the arguments I am giving it.

main.py logging:

subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\*******.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Upload url: https://storage.googleapis.com/upload/storage/v1/b/****/o?uploadType=resumable
Total size: 113932332
Making request: POST https://oauth2.googleapis.com/token
Starting new HTTPS connection (1): oauth2.googleapis.com:443
https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
Starting new HTTPS connection (1): storage.googleapis.com:443
https://storage.googleapis.com:443 "POST /upload/storage/v1/b/******/o?uploadType=resumable HTTP/1.1" 200 0
https://storage.googleapis.com:443 "POST /upload/storage/v1/b/******/o?uploadType=resumable HTTP/1.1" 400 239
File uploaded to gs://*****/audio.wav
Error during Google Cloud Speech-to-Text API request: 404 No such object: *****/audio.wav
Full traceback:
Traceback (most recent call last):File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\grpc_helpers.py", line 79, in error_remapped_callablereturn callable_(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\grpc\_channel.py", line 1160, in __call__return _end_unary_response_blocking(state, call, False, None)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\grpc\_channel.py", line 1003, in _end_unary_response_blockingraise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:status = StatusCode.NOT_FOUNDdetails = "No such object: *****/audio.wav"debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.13.138:443 {grpc_message:"No such object: *****/audio.wav", grpc_status:5, created_time:"2024-01-16T00:11:43.0873289+00:00"}"
>The above exception was the direct cause of the following exception:Traceback (most recent call last):File "C:\Users\*****\Documents\Work\IRSST Transcriptions 2024\transcriptor.py", line 57, in transcribeoperation = client.long_running_recognize(config=config, audio=audio)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\cloud\speech_v1\services\speech\client.py", line 708, in long_running_recognizeresponse = rpc(File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\gapic_v1\method.py", line 131, in __call__return wrapped_func(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\timeout.py", line 120, in func_with_timeoutreturn func(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\grpc_helpers.py", line 81, in error_remapped_callableraise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.NotFound: 404 No such object: nicarg/audio.wav

Problematic snippet:

    @staticmethoddef send_to_gcs(audio_file, destination_uri):"""Send the converted WAV file to the GCS bucket.Parameters:- audio_file: Audio file.- destination_uri: Destination URI."""try:# Create a storage clientprint("Accessing client credentials...")storage_client = storage.Client.from_service_account_json(GOOGLE_CREDENTIALS_PATH)# Configure resumable upload with a timeouttry:print("Configuring resumable upload with a timeout...")upload_url = f"https://storage.googleapis.com/upload/storage/v1/b/{BUCKET_NAME}/o?uploadType=resumable"chunk_size = 10 * 1024 * 1024  # Adjust as neededtotal_size = os.path.getsize(audio_file)timeout_seconds = 5 * 60  # Adjust as neededupload = google.resumable_media.requests.upload.ResumableUpload(upload_url,chunk_size=chunk_size,)transport = AuthorizedSession(credentials=storage_client._credentials)logging.debug(f"Upload url: {upload_url}")logging.debug(f"Total size: {total_size}")except Exception as e:logging.error(f"Error configuring resumable upload: {e}")# Open the audio file and perform the uploadprint("Open the audio file and performing the upload...")try:with open(audio_file, "rb") as audio_data:upload.initiate(transport=transport,stream=audio_data,metadata={"name": f"{destination_uri}","Content-Type": "audio/wav"},total_bytes=total_size,timeout=timeout_seconds,content_type="audio/wav",)# Use the http property for the requesttransport.request(method="POST",url=upload_url,headers=upload._headers,data=upload.bytes_uploaded,)except Exception as e:logging.error(f"Error performing the upload: {e}")logging.info(f"File uploaded to {destination_uri}")return destination_uriexcept Exception as e:logging.error(f"Error during cloud upload: {e}")return None

I was thinking of using less recent versions of google.resumable_media but I am afraid of screwing all up, I have been modifying this code for days and I want to start transcribing as soon as possible.

EDIT 1: I made some modifications to the code according to your suggestions, but now the problem I am having is the code seems to work without errors but doesn't upload the audio file into my GCS bucket.

Answer

As deceze commented, ResumableUpload doesn't support total_size and timeout parameters. The API doc has the following signature:

ResumableUpload(upload_url, chunk_size, checksum=None, headers=None)

You might want to use the initiate method to configure the total_bytes and timeout configuration.

Though a bit outdated, but here is a good example of using ResumableUpload.

https://en.xdnf.cn/q/118902.html

Related Q&A

Selenium, Intercept HTTP Request?

Using selenium 4.12 in Python, how can I intercept an HTTP request to see what its body or headers look like? Please Note, that Im not asking for code but rather for resources/ideas of different or su…

Flask server returns 404 on localhost:5000 w/ Twilio

Im following this guide (Python Quickstart: Replying to SMS and MMS Messages) to try and set up a flask server, but when I try to connect to http://localhost:5000 I get a 404 error. I can ping 127.0.0.…

printing values and keys from a dictionary in a specific format (python)

I have this dictionary (name and grade):d1 = {a: 1, b: 2, c: 3}and I have to print it like this:|a | 1 | C | |b | 2 | B | |c | 3 | …

stdscr.getstr() ignore keys, just string

I just need convert entered text(bytes) to string. But if i on cyrillic press Backspace and some character, python throw me this error:UnicodeDecodeError: utf-8 codec cant decode byte 0xd0 in position …

What is wrong with the following program code, attempting to initialize a 4 x 4 matrix of integers?

What is wrong with the following program code, attempting to initialize a 4 x 4 matrix of integers? How should the initialization be done?line = [0] * 4 matrix = [line, line, line, line]

Creating a Data Pipeline to BigQuery Using Cloud Functions and Cloud Scheduler

I am trying to build a Data Pipeline that will download the data from this website and push it to a BigQuery Table. def OH_Data_Pipeline(trigger=Yes):if trigger==Yes:import pandas as pdimport pandas_gb…

Matching several string matches from lists and making a new row for each match

I have a data frame with text in one of the columns and I am using regex formatted strings to see if I can find any matches from three lists. However, when there are multiple matches from list 1, I wan…

Join and format array of objects in Python

I want to join and format values and array of objects to a string in python. Is there any way for me to do that?url = "https://google.com", search = "thai food", search_res = [{&q…

Copying text from file to specified Excel column [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

Name error: Variable not defined

Program calculates the shortest route from point, to line, then to second point. Also I need to say how long is from the start of the line, to where point crosses. My code so far: from math import sqrt…