Question 1

I have a web scraper in Scrapy that gets data items. I want to asynchronously insert them into a database as well.

For example, I have a transaction that inserts some items into my db using SQLAlchemy Core:

def process_item(self, item, spider):with self.connection.begin() as conn:conn.execute(insert(table1).values(item['part1'])conn.execute(insert(table2).values(item['part2'])

I understand that it's possible to use SQLAlchemy Core asynchronously with Twisted with alchimia. The documentation code example for alchimia is below.

What I don't understand is how can I use my above code in the alchimia framework. How can I set up process_item to use a reactor?

Can I do something like this?

@inlineCallbacks
def process_item(self, item, spider):with self.connection.begin() as conn:yield conn.execute(insert(table1).values(item['part1'])yield conn.execute(insert(table2).values(item['part2'])

How do I write the reactor part?

Or is there an easier way to do nonblocking database insertions in a Scrapy pipeline?

For reference, here is the code example from alchimia's documentation:

from alchimia import TWISTED_STRATEGYfrom sqlalchemy import (create_engine, MetaData, Table, Column, Integer, String
)
from sqlalchemy.schema import CreateTablefrom twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react@inlineCallbacks
def main(reactor):engine = create_engine("sqlite://", reactor=reactor, strategy=TWISTED_STRATEGY)metadata = MetaData()users = Table("users", metadata,Column("id", Integer(), primary_key=True),Column("name", String()),)# Create the tableyield engine.execute(CreateTable(users))# Insert some usersyield engine.execute(users.insert().values(name="Jeremy Goodwin"))yield engine.execute(users.insert().values(name="Natalie Hurley"))yield engine.execute(users.insert().values(name="Dan Rydell"))yield engine.execute(users.insert().values(name="Casey McCall"))yield engine.execute(users.insert().values(name="Dana Whitaker"))result = yield engine.execute(users.select(users.c.name.startswith("D")))d_users = yield result.fetchall()# Print out the usersfor user in d_users:print "Username: %s" % user[users.c.name]if __name__ == "__main__":react(main, [])

Question 2

How can I set up process_item to use a reactor?

You don't need to manage another reactor in your pipeline.
Instead, you could do asynchronous database interactions within an item pipeline by returning a deferred from the pipeline.

See also Scrapy's doc and sample code doing asynchronous operations within an item pipeline by returning a deferred.

Nonblocking Scrapy pipeline to database

Related Q&A

python function to return javascript date.getTime()

Pulling MS access tables and putting them in data frames in python

Infinite loop while adding two integers using bitwise operations?

When is pygame.init() needed?

mypy overrides in toml are ignored?

/usr/bin/env: python2.6: No such file or directory error

pandas, dataframe, groupby, std

Count occurrences of a couple of specific words

numpy: how to fill multiple fields in a structured array at once

Combine date column and time column into datetime column