How should I move blobs from BlobStore over to Google Cloud Storage?

2024/9/28 11:20:00

Our application has been running on App Engine using the Blobstore for years. We would like to move our video files over to Google Cloud Storage. What is the best practice for migrating large blobs from Blobstore over to GCS?

Is it just a matter of using BlobReader and writing bytes to GCS one at a time? Or are there other shortcuts/tools available?

As for writing to GCS from App Engine, there are no shortage of libraries to choose from:

  • Blobstore API (can generate BlobKey for GCS objects, BlobStore Files API is deprecated, but would be valid for one time use)
  • Google Cloud Storage API (again deprecated, but would be valid for one time use)
  • Google Cloud Storage Client Library
  • Google Cloud Storage Python Library
  • Google Cloud Storage JSON API Library
  • boto

Any reason to use one over the other?

Answer

I haven't had to do this, but I'd say there are no automatic migration tools. So yes, you have to roll your own.

My approach would be to batch migrations using cron.yaml, keeping track of which files are migrated so you can serve them differently (as per this page). Cloud storage provides you with a md5 hash after an object is created, which you could use to validate that the migration of each file was error-free before deleting the Blobstore copy (can compute a similar hash while the file is in-transit on migration).

As for libraries:

  • the Google Cloud Storage Client Library uses the resumable upload functionality to "stream" the file which will make things smoother memory-wise. I have found it to be quite reliable (as opposed to the deprecated Google Cloud Storage API/Files API).
  • the JSON API Client is lower-level.
  • boto isn't optimized for use in GAE, but rather on the desktop, and you don't want to be leaving the Google Cloud to do the migration ($$$).
  • as far as I know the Blobstore lets you serve files from GCS and have users upload files to GCS but not write files from your application per se.
https://en.xdnf.cn/q/71349.html

Related Q&A

Python: Find `sys.argv` before the `sys` module is loaded

I want to find the command line arguments that my program was called with, i.e. sys.argv, but I want to do that before Python makes sys.argv available. This is because Im running code in usercustomize.…

Dotted lines instead of a missing value in matplotlib

I have an array of some data, where some of the values are missingy = np.array([np.NAN, 45, 23, np.NAN, 5, 14, 22, np.NAN, np.NAN, 18, 23])When I plot it, I have these NANs missing (which is expected)f…

How to change the creation date of file using python on a mac?

I need to update the creation time of a .mp4 file so that it will appear at the top of a list of media files sorted by creation date. I am able to easily update both the accessed and modified date of …

Classification tree in sklearn giving inconsistent answers

I am using a classification tree from sklearn and when I have the the model train twice using the same data, and predict with the same test data, I am getting different results. I tried reproducing on…

Modifying binary file with Python

i am trying to patch a hex file. i have two patch files (hex) named "patch 1" and "patch 2"the file to be patched is a 16 MB file named "file.bin".i have tried many differ…

python error : module object has no attribute AF_UNIX

this is my python code :if __name__ == __main__: import socket sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) sock.connect((0.0.0.0, 4000)) import time time.sleep(2) #sock.send(1)print …

How to speed up pandas string function?

I am using the pandas vectorized str.split() method to extract the first element returned from a split on "~". I also have also tried using df.apply() with a lambda and str.split() to produc…

sqlalchemy autoloaded orm persistence

We are using sqlalchemys autoload feature to do column mapping to prevent hardcoding in our code.class users(Base):__tablename__ = users__table_args__ = {autoload: True,mysql_engine: InnoDB,mysql_chars…

Data Normalization with tensorflow tf-transform

Im doing a neural network prediction with my own datasets using Tensorflow. The first I did was a model that works with a small dataset in my computer. After this, I changed the code a little bit in or…

Relationship of metaclasss __call__ and instances __init__?

Say Ive got a metaclass and a class using it:class Meta(type):def __call__(cls, *args):print "Meta: __call__ with", argsclass ProductClass(object):__metaclass__ = Metadef __init__(self, *args…