Quickly dumping a database in memory to file

2024/10/11 16:22:43

I want to take advantage of the speed benefits of holding an SQLite database (via SQLAlchemy) in memory while I go through a one-time process of inserting content, and then dump it to file, stored to be used later.

Consider a bog-standard database created in the usual way:

# in-memory database
e = create_engine('sqlite://')

Is there a quicker way of moving its contents to disc, other than just creating a brand new database and inserting each entry manually?

EDIT:

There is some doubt as to whether or not I'd even see any benefits to using an in-memory database. Unfortunately I already see a huge time difference of about 120x.

This confusion is probably due to me missing out some important detail in the question. Also probably due to a lack of understanding on my part re: caches / page sizes / etc. Allow me to elaborate:

I am running simulations of a system I have set up, with each simulation going through the following stages:

  1. Make some queries to the database.
  2. Make calculations / run a simulation based on the results of those queries.
  3. insert new entries into the database based on the most recent simulation.
  4. Make sure the database is up to date with the new entries by running commit().

While I only ever make a dozen or so insertions on each simulation run, I do however run millions of simulations, and the results of each simulation need to be available for future simulations to take place. As I say, this read and write process takes considerably longer when running a file-backed database; it's the difference between 6 hours and a month.

Hopefully this clarifies things. I can cobble together a simple python script to outline my process further a little further if necessary.

Answer

SQLAlchemy and SQLite know how to cache and do batch-inserts just fine.

There is no benefit in using an in-memory SQLite database here, because that database uses pages just like the on-disk version would, and the only difference is that eventually those pages get written to disk for disk-based database. The difference in performance is only 1.5 times, see SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?

There is also no way to move the in-memory database to a disk-based database at a later time, short of running queries on the in-memory database and executing batch inserts into the disk-based database on two separate connections.

https://en.xdnf.cn/q/69758.html

Related Q&A

QStatusBar message disappears on menu hover

I have a very basic QMainWindow application that contains a menubar and a statusbar. When I hover over the menu the status message disappears. More precisely, the status message is cleared. I have no i…

How to eliminate a python3 deprecation warning for the equality operator?

Although the title can be interpreted as three questions, the actual problem is simple to describe. On a Linux system I have python 2.7.3 installed, and want to be warned about python 3 incompatibiliti…

Cannot get scikit-learn installed on OS X

I cannot install scikit-learn. I can install other packages either by building them from source or through pip without a problem. For scikit-learn, Ive tried cloning the project on GitHub and installin…

Decompressing a .bz2 file in Python

So, this is a seemingly simple question, but Im apparently very very dull. I have a little script that downloads all the .bz2 files from a webpage, but for some reason the decompressing of that file is…

Why does Pandas coerce my numpy float32 to float64?

Why does Pandas coerce my numpy float32 to float64 in this piece of code:>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame([[1, 2, a], [3, 4, b]], dtype=np…

Conda and Python Modules

Sadly, I do not understand how to install random python modules for use within iPython Notebooks with my Anaconda distribution. The issue is compounded by the fact that I need to be able to do these t…

WeakValueDictionary retaining reference to object with no more strong references

>>> from weakref import WeakValueDictionary >>> class Foo(object): ... pass >>> foo = Foo() >>> db = WeakValueDictionary() >>> db[foo-id] = foo >>…

Using pretrained glove word embedding with scikit-learn

I have used keras to use pre-trained word embeddings but I am not quite sure how to do it on scikit-learn model.I need to do this in sklearn as well because I am using vecstack to ensemble both keras s…

Is there an easy way to tell how much time is spent waiting for the Python GIL?

I have a long-running Python service and Id like to know how much cumulative wall clock time has been spent by any runnable threads (i.e., threads that werent blocked for some other reason) waiting for…

Inverse filtering using Python

Given an impulse response h and output y (both one-dimensional arrays), Im trying to find a way to compute the inverse filter x such that h * x = y, where * denotes the convolution product.For example,…