Downloading files from public Google Drive in python: scoping issues?

2024/9/20 21:39:32

Using my answer to my question on how to download files from a public Google drive I managed in the past to download images using their IDs from a python script and Google API v3 from a public drive using the following bock of code:

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):url = lfile_id = re.search(regex, url)[0]request = drive_service.files().get_media(fileId=file_id)fh = io.FileIO(f"file_{i}", mode='wb')downloader = MediaIoBaseDownload(fh, request)done = Falsewhile done is False:status, done = downloader.next_chunk()print("Download %d%%." % int(status.progress() * 100))

In the mean time I discovered pydrive and pydrive2, two wrappers around Google API v2 that allows to do very useful things such as listing files from folders and basically allows to do the same thing with a lighter syntax:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import io
import re
CLIENT_SECRET_FILE = "client_secrets.json"gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):url = lfile_id = re.search(regex, url)[0]file_handle = drive.CreateFile({'id': file_id})file_handle.GetContentFile(f"file_{i}")

However now whether I use pydrive or the raw API I cannot seem to be able to download the same files and instead I am met with:

googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files/fileID?alt=media returned "File not found: fileID.". Details: "[{'domain': 'global', 'reason': 'notFound', 'message': 'File not found: fileID.', 'locationType': 'parameter', 'location': 'fileId'}]">

I tried everything and registered 3 different apps using Google console it seems it might be (or not) a question of scoping (see for instance this answer, with apps having access to only files in my Google drive or created by this app). However I did not have this issue before (last year).

When going to the Google console explicitly giving https://www.googleapis.com/auth/drive as a scope to the API mandates filling a ton of fields with application's website/conditions of use/confidentiality rules/authorized domains and youtube videos explaining the app. However I will be the sole user of this script. So I could only give explicitly the following scopes:

/auth/drive.appdata
/auth/drive.file
/auth/drive.install

Is it because of scoping ? Is there a solution that doesn't require creating a homepage and a youtube video ?

EDIT 1: Here is an example of links_to_download:

links_to_download = ["https://drive.google.com/file/d/fileID/view?usp=drivesdk&resourcekey=0-resourceKeyValue"]

EDIT 2: It is super instable sometimes it works without a sweat sometimes it doesn't. When I relaunch the script multiple times I get different results. Retry policies are working to a certain extent but sometimes it fails multiple times for hours.

Answer

Well thanks to the security update released by Google few months before. This makes the link sharing stricter and you need resource key as well to access the file in-addition to the fileId.

As per the documentation , You need to provide the resource key as well for newer links, if you want to access it in the header X-Goog-Drive-Resource-Keys as fileId1/resourceKey1.

If you apply this change in your code, it will work as normal. Example edit below:

regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
regex_rkey = "(?<=resourcekey=)[a-zA-Z0-9-]+"
for i, l in enumerate(links_to_download):url = lfile_id = re.search(regex, url)[0]resource_key = re.search(regex_rkey, url)[0]request = drive_service.files().get_media(fileId=file_id)request.headers["X-Goog-Drive-Resource-Keys"] = f"{file_id}/{resource_key}"fh = io.FileIO(f"file_{i}", mode='wb')downloader = MediaIoBaseDownload(fh, request)done = Falsewhile done is False:status, done = downloader.next_chunk()print("Download %d%%." % int(status.progress() * 100))

Well, the regex for resource key was something I quickly made, so cannot be sure on if it supports every case. But this provides you the solution. Now, you may have to listen to old and new links based on this and set the changes.

https://en.xdnf.cn/q/72125.html

Related Q&A

Change locale for django-admin-tools

In my settings.py file I have:LANGUAGE_CODE = ru-RUalso, I have installed and working django-admin-tools. But admin language still english. What Im doing wrong?PS.$ cat settings.py | grep USE | grep -…

Container localhost does not exist error when using Keras + Flask Blueprints

I am trying to serve a machine learning model via an API using Flasks Blueprints, here is my flask __init__.py filefrom flask import Flaskdef create_app(test_config=None):app = Flask(__name__)@app.rout…

Serving static files with WSGI and Python 3

What is the simplest way to serve static files with WSGI and Python 3.2? There are some WSGI apps for PEP 333 and Python 2 for this purpose - but was is about PEP 3333 and Python 3? I want to use wsg…

Force INNER JOIN for Django Query

Here is my schema:City PhotographerIm trying to get a list of cities that have at least one photographer, and return the photographer count for the cities.Here is the queryset Im working with:City.obj…

Sklearn Decision Rules for Specific Class in Decision tree

I am creating a decision tree.My data is of the following typeX1 |X2 |X3|.....X50|Y _____________________________________ 1 |5 |7 |.....0 |1 1.5|34 |81|.....0 |1 4 |21 |21|.... 1 |0 65 |34 |23|..…

Cubic hermit spline interpolation python

I would like to calculate a third-degree polynomial that is defined by its function values and derivatives at specified points.https://en.wikipedia.org/wiki/Cubic_Hermite_splineI know of scipys interpo…

Increase Accuracy of float division (python)

Im writing a bit of code in PyCharm, and I want the division to be much more accurate than it currently is (40-50 numbers instead of about 15). How Can I accomplish this?Thanks.

Twitter API libraries for desktop apps?

Im looking for a way to fetch recent posts from twitter. Really I just want to be able to grab and store new posts about a certain topic from twitter in a text file. Are there any current programs or l…

How to generate a PDF from an HTML / CSS (including images) source in Python? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, argum…

Modify subclassed string in place

Ive got the following string subclass:class S(str):def conc(self, next_val, delimiter = ):"""Concatenate values to an existing string"""if not next_val is None:self = sel…