Our application has been running on App Engine using the Blobstore for years. We would like to move our video files over to Google Cloud Storage. What is the best practice for migrating large blobs from Blobstore over to GCS?
Is it just a matter of using BlobReader and writing bytes to GCS one at a time? Or are there other shortcuts/tools available?
As for writing to GCS from App Engine, there are no shortage of libraries to choose from:
- Blobstore API (can generate BlobKey for GCS objects, BlobStore Files API is deprecated, but would be valid for one time use)
- Google Cloud Storage API (again deprecated, but would be valid for one time use)
- Google Cloud Storage Client Library
- Google Cloud Storage Python Library
- Google Cloud Storage JSON API Library
- boto
Any reason to use one over the other?
I haven't had to do this, but I'd say there are no automatic migration tools. So yes, you have to roll your own.
My approach would be to batch migrations using cron.yaml
, keeping track of which files are migrated so you can serve them differently (as per this page). Cloud storage provides you with a md5 hash after an object is created, which you could use to validate that the migration of each file was error-free before deleting the Blobstore copy (can compute a similar hash while the file is in-transit on migration).
As for libraries:
- the Google Cloud Storage Client Library uses the resumable upload functionality to "stream" the file which will make things smoother memory-wise. I have found it to be quite reliable (as opposed to the deprecated Google Cloud Storage API/Files API).
- the JSON API Client is lower-level.
boto
isn't optimized for use in GAE, but rather on the desktop, and you don't want to be leaving the Google Cloud to do the migration ($$$).
- as far as I know the Blobstore lets you serve files from GCS and have users upload files to GCS but not write files from your application per se.