Apache Spark ALS - how to perform Live Recommendations / fold-in anonym user

2024/7/4 8:45:16

I am using Apache Spark (Pyspark API for Python) ALS MLLIB to develop a service that performs live recommendations for anonym users (users not in the training set) in my site. In my usecase I train the model on the User ratings in this way:

from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
ratings = df.map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
rank = 10 
numIterations = 10
model = ALS.trainImplicit(ratings, rank, numIterations)

Now, each time an anonym user selects an item in the catalogue, I want to fold-in its vector in the ALS model and get the recommendations (just like the recommendProducts() call), but avoiding the re-training of the whole model.

Is there any way to easily do the fold-in of the new anonym user vector after training the ALS model in Apache Spark?

Thanks in advance

Answer

There are a few Open Source "model server" solutions that I have seen advertised, but have no hands-on experience on. I also heard of a commercial offering, but can't just remember the name right now.
So make your own opinion, and keep a watch on possible alternatives.

PredictionIO (the start-up has been gobbled by SalesForce but their solution is still available) uses a Spark+Hadoop+HBase stack, plus some kind of web server component.

MLeap is yet-another-ML-library-with-limited-feature-set, which can be plugged into Spark/Scikit-Learn/whatever, and can spawn a web service -- or export your model to a hosted solution named Combust.ml

MLDB is yet-another-ML-library-with-limited-feature-set, completely outside of the Python/Spark ecosystem, but claims full integration with TensorFlow -- including the ability to import existing Deep Learning models and tweak them for different uses.

https://en.xdnf.cn/q/73358.html

Related Q&A

python JIRA connection with proxy

Im trying to connect via python-jira using a proxy:server = {"server": "https://ip:port/jira",proxies: {"http": "http://ip:port", "https": "http:/…

How can I iterate over only the first variable of a tuple

In python, when you have a list of tuples, you can iterate over them. For example when you have 3d points then:for x,y,z in points:pass# do something with x y or zWhat if you only want to use the first…

Bottle with Gunicorn

What is the difference between running bottle script like thisfrom bottle import route, run@route(/) def index():return Hello!run(server=gunicorn, host=0.0.0.0, port=8080)with command python app.py and…

Run several python programs at the same time

I have python script run.py:def do(i):# doing something with i, that takes timestart_i = sys.argv[1] end_i = sys.argv[2] for i in range(start_i, end_i):do(i)Then I run this script:python run.py 0 10000…

Using python, what is the most accurate way to auto determine a users current timezone

I have verified that dateutils.tz.tzlocal() does not work on heroku and even if it did, wouldnt it just get the tz from the OS of the computer its on, not necessarly the users?Short of storing a users…

ImportError: cannot import name ParseMode from telegram

I am trying to create a telegram bot. The code i am trying to execute is : from telegram import ParseModeBut it is throwing up this error: ImportError: cannot import name ParseMode from telegram (C:\Pr…

Executing bash with subprocess.Popen

Im trying to write a wrapper for a bash session using python. The first thing I did was just try to spawn a bash process, and then try to read its output. like this:from subprocess import Popen, PIPE b…

Attribute error when attempting to get a value for field

Im working with the django rest framework and the serializer Im trying to use is creating errors. Im trying to do something like https://gist.github.com/anonymous/7463dce5b0bfcf9b6767 but I still get t…

Why did I have problems with alembic migrations

Project structue(only directory with DB migrations):--db_manage:alembic.ini--alembic:env.pyscript.py.makoREADME--versions:#migration filesWhen I try to run command: python db_manage/alembic/env.py, I h…

Python and App Engine project structure

I am relatively new to python and app engine, and I just finished my first project. It consists of several *.py files (usually py file for every page on the site) and respectively temple files for each…