I have a web scraper in Scrapy that gets data items. I want to asynchronously insert them into a database as well.
For example, I have a transaction that inserts some items into my db using SQLAlchemy Core:
def process_item(self, item, spider):with self.connection.begin() as conn:conn.execute(insert(table1).values(item['part1'])conn.execute(insert(table2).values(item['part2'])
I understand that it's possible to use SQLAlchemy Core asynchronously with Twisted with alchimia
. The documentation code example for alchimia
is below.
What I don't understand is how can I use my above code in the alchimia framework. How can I set up process_item
to use a reactor?
Can I do something like this?
@inlineCallbacks
def process_item(self, item, spider):with self.connection.begin() as conn:yield conn.execute(insert(table1).values(item['part1'])yield conn.execute(insert(table2).values(item['part2'])
How do I write the reactor part?
Or is there an easier way to do nonblocking database insertions in a Scrapy pipeline?
For reference, here is the code example from alchimia
's documentation:
from alchimia import TWISTED_STRATEGYfrom sqlalchemy import (create_engine, MetaData, Table, Column, Integer, String
)
from sqlalchemy.schema import CreateTablefrom twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react@inlineCallbacks
def main(reactor):engine = create_engine("sqlite://", reactor=reactor, strategy=TWISTED_STRATEGY)metadata = MetaData()users = Table("users", metadata,Column("id", Integer(), primary_key=True),Column("name", String()),)# Create the tableyield engine.execute(CreateTable(users))# Insert some usersyield engine.execute(users.insert().values(name="Jeremy Goodwin"))yield engine.execute(users.insert().values(name="Natalie Hurley"))yield engine.execute(users.insert().values(name="Dan Rydell"))yield engine.execute(users.insert().values(name="Casey McCall"))yield engine.execute(users.insert().values(name="Dana Whitaker"))result = yield engine.execute(users.select(users.c.name.startswith("D")))d_users = yield result.fetchall()# Print out the usersfor user in d_users:print "Username: %s" % user[users.c.name]if __name__ == "__main__":react(main, [])