How to update an SVM model with new data

2024/10/14 17:22:25

I have two data set with different size.

1) Data set 1 is with high dimensions 4500 samples (sketches).

2) Data set 2 is with low dimension 1000 samples (real data). I suppose that "both data set have the same distribution"

I want to train an non linear SVM model using sklearn on the first data set (as a pre-training ), and after that I want to update the model on a part of the second data set (to fit the model). How can I develop a kind of update on sklearn. How can I update a SVM model?

Answer

In sklearn you can do this only for linear kernel and using SGDClassifier (with appropiate selection of loss/penalty terms, loss should be hinge, and penalty L2). Incremental learning is supported through partial_fit methods, and this is not implemented for neither SVC nor LinearSVC.

Unfortunately, in practise fitting SVM in incremental fashion for such small datasets is rather useless. SVM has easy obtainable global solution, thus you do not need pretraining of any form, in fact it should not matter at all, if you are thinking about pretraining in the neural network sense. If correctly implemented, SVM should completely forget previous dataset. Why not learn on the whole data in one pass? This is what SVM is supposed to do. Unless you are working with some non-convex modification of SVM (then pretraining makes sense).

To sum up:

  • From theoretical and practical point of view there is no point in pretraining SVM. You can either learn only on the second dataset, or on both in the same time. Pretraining is only reasonable for methods which suffer from local minima (or hard convergence of any kind) thus need to start near actual solution to be able to find reasonable model (like neural networks). SVM is not one of them.
  • You can use incremental fitting (although in sklearn it is very limited) for efficiency reasons, but for such small dataset you will be just fine fitting whole dataset at once.
https://en.xdnf.cn/q/69387.html

Related Q&A

Expanding NumPy array over extra dimension

What is the easiest way to expand a given NumPy array over an extra dimension?For example, suppose I have>>> np.arange(4) array([0, 1, 2, 3]) >>> _.shape (4,) >>> expand(np.…

Django-Haystack giving attribute error?

I am trying to use Haystack and Whoosh with my Django app. I followed the steps on Haystack docs, but i am getting this error when i do a searchAttributeError at /search/ module object has no attribute…

python calendar with holidays [duplicate]

This question already has answers here:Closed 12 years ago.Possible Duplicate:Holiday Calendars, File Formats, et al. Hi, Is there a calendar library in Python with which I can check for holidays, com…

How to choose your conda environment in Jupyter Notebook

I installed Anaconda 5.3 with Python 3.7 (root environment). After that I created a new environment (py36) using Python 3.6I activated the new environment with activate py36 conda env list shows that t…

How do I stagger or offset x-axis labels in Matplotlib?

I was wondering if there is an easy way to offset x-axis labels in a way similar to the attached image.

graphviz segmentation fault

Im building a graph with many nodes, around 3000. I wrote a simple python program to do the trick with graphviz, but it gives me segmentation fault and I dont know why, if the graph is too big or if im…

how to pass char pointer as argument in ctypes python

Please help me in converting below line of c++ code into ctypes python:Ret = openFcn(&Handle, "C:\\Config.xml");below are the declarations of each:typedef uint16_t (* OpenDLLFcnP)(void **…

Restarting a Python Interpreter Quietly

I have a python interpreter embedded inside an application. The application takes a long time to start up and I have no ability to restart the interpreter without restarting the whole application. What…

Unique lists from a list

Given a list I need to return a list of lists of unique items. Im looking to see if there is a more Pythonic way than what I came up with:def unique_lists(l):m = {}for x in l:m[x] = (m[x] if m.get(x) !…

How to run Spyder with Python 3.7 with Anaconda

I have installed Anaconda on a Windows 10 machine which comes with Spyder and Python 3.6 but I wish to upgrade to Python 3.7To create an Anaconda Environment with Python 3.7 is easy by using:conda crea…