Question 1

I am developping a tool to transcribe interviews for a contract I have. For that I develop a code with the following flow:

After input validation, the audio file (in m4a) is converted to wav and stored locally.
Then, the WAV file is sent to my Google Cloud Storage bucket.
Finally, the WAV file inside GCS is transcribed.

For the moment, step 1 works fine and step 3 is yet to be verified. I am currently having problems with the second part of the flow, as apparently google.resumable_media doesn't recognize some of the arguments I am giving it.

main.py logging:

subprocess.call(['ffmpeg', '-y', '-f', 'mp4', '-i', 'audio\\*******.m4a', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
Upload url: https://storage.googleapis.com/upload/storage/v1/b/****/o?uploadType=resumable
Total size: 113932332
Making request: POST https://oauth2.googleapis.com/token
Starting new HTTPS connection (1): oauth2.googleapis.com:443
https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
Starting new HTTPS connection (1): storage.googleapis.com:443
https://storage.googleapis.com:443 "POST /upload/storage/v1/b/******/o?uploadType=resumable HTTP/1.1" 200 0
https://storage.googleapis.com:443 "POST /upload/storage/v1/b/******/o?uploadType=resumable HTTP/1.1" 400 239
File uploaded to gs://*****/audio.wav
Error during Google Cloud Speech-to-Text API request: 404 No such object: *****/audio.wav
Full traceback:
Traceback (most recent call last):File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\grpc_helpers.py", line 79, in error_remapped_callablereturn callable_(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\grpc\_channel.py", line 1160, in __call__return _end_unary_response_blocking(state, call, False, None)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\grpc\_channel.py", line 1003, in _end_unary_response_blockingraise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:status = StatusCode.NOT_FOUNDdetails = "No such object: *****/audio.wav"debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.13.138:443 {grpc_message:"No such object: *****/audio.wav", grpc_status:5, created_time:"2024-01-16T00:11:43.0873289+00:00"}"
>The above exception was the direct cause of the following exception:Traceback (most recent call last):File "C:\Users\*****\Documents\Work\IRSST Transcriptions 2024\transcriptor.py", line 57, in transcribeoperation = client.long_running_recognize(config=config, audio=audio)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\cloud\speech_v1\services\speech\client.py", line 708, in long_running_recognizeresponse = rpc(File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\gapic_v1\method.py", line 131, in __call__return wrapped_func(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\timeout.py", line 120, in func_with_timeoutreturn func(*args, **kwargs)File "c:\Users\*****\Documents\Work\IRSST Transcriptions 2024\venv\lib\site-packages\google\api_core\grpc_helpers.py", line 81, in error_remapped_callableraise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.NotFound: 404 No such object: nicarg/audio.wav

Problematic snippet:

    @staticmethoddef send_to_gcs(audio_file, destination_uri):"""Send the converted WAV file to the GCS bucket.Parameters:- audio_file: Audio file.- destination_uri: Destination URI."""try:# Create a storage clientprint("Accessing client credentials...")storage_client = storage.Client.from_service_account_json(GOOGLE_CREDENTIALS_PATH)# Configure resumable upload with a timeouttry:print("Configuring resumable upload with a timeout...")upload_url = f"https://storage.googleapis.com/upload/storage/v1/b/{BUCKET_NAME}/o?uploadType=resumable"chunk_size = 10 * 1024 * 1024  # Adjust as neededtotal_size = os.path.getsize(audio_file)timeout_seconds = 5 * 60  # Adjust as neededupload = google.resumable_media.requests.upload.ResumableUpload(upload_url,chunk_size=chunk_size,)transport = AuthorizedSession(credentials=storage_client._credentials)logging.debug(f"Upload url: {upload_url}")logging.debug(f"Total size: {total_size}")except Exception as e:logging.error(f"Error configuring resumable upload: {e}")# Open the audio file and perform the uploadprint("Open the audio file and performing the upload...")try:with open(audio_file, "rb") as audio_data:upload.initiate(transport=transport,stream=audio_data,metadata={"name": f"{destination_uri}","Content-Type": "audio/wav"},total_bytes=total_size,timeout=timeout_seconds,content_type="audio/wav",)# Use the http property for the requesttransport.request(method="POST",url=upload_url,headers=upload._headers,data=upload.bytes_uploaded,)except Exception as e:logging.error(f"Error performing the upload: {e}")logging.info(f"File uploaded to {destination_uri}")return destination_uriexcept Exception as e:logging.error(f"Error during cloud upload: {e}")return None

I was thinking of using less recent versions of google.resumable_media but I am afraid of screwing all up, I have been modifying this code for days and I want to start transcribing as soon as possible.

EDIT 1: I made some modifications to the code according to your suggestions, but now the problem I am having is the code seems to work without errors but doesn't upload the audio file into my GCS bucket.

Question 2

As deceze commented, ResumableUpload doesn't support total_size and timeout parameters. The API doc has the following signature:

ResumableUpload(upload_url, chunk_size, checksum=None, headers=None)

You might want to use the initiate method to configure the total_bytes and timeout configuration.

Though a bit outdated, but here is a good example of using ResumableUpload.

Google Cloud Storage: init() got an unexpected keyword argument total_size

Related Q&A

Selenium, Intercept HTTP Request?

Flask server returns 404 on localhost:5000 w/ Twilio

printing values and keys from a dictionary in a specific format (python)

stdscr.getstr() ignore keys, just string

What is wrong with the following program code, attempting to initialize a 4 x 4 matrix of integers?

Creating a Data Pipeline to BigQuery Using Cloud Functions and Cloud Scheduler

Matching several string matches from lists and making a new row for each match

Join and format array of objects in Python

Copying text from file to specified Excel column [closed]

Name error: Variable not defined

Google Cloud Storage: __init__() got an unexpected keyword argument total_size

Related Q&A

Google Cloud Storage: init() got an unexpected keyword argument total_size