fit method in python sklearn

2024/10/15 16:17:09

I am asking myself various questions about the fit method in sklearn.

Question 1: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)

Is the content of the variable model changing whatsoever during the process?

Question 2: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = svd_1.fit(X2)

What is happening to svd_1? In other words, svd_1 has already been fitted and I fit it again, so what is happenning to its component?

Answer

Question 1: Is the content of the variable model changing whatsoever during the process?

Yes. The fit method modifies the object. And it returns a reference to the object. Thus, take care! In the first example all three variables model, svd_1, and svd_2 actually refer to the same object.

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)
print(model is svd_1 is svd_2)  # prints True

Question 2: What is happening to svd_1?

model and svd_1 refer to the same object, so there is absolutely no difference between the first and the second example.

Final Remark: What happens in both examples is that the result of fit(X1) is overwritten by fit(X2), as pointed out in the answer by David Maust. If you want to have two different models fitted to two different sets of data you need to do something like this:

svd_1 = TruncatedSVD().fit(X1)
svd_2 = TruncatedSVD().fit(X2)
https://en.xdnf.cn/q/69264.html

Related Q&A

Django 1.9 JSONField update behavior

Ive recently updated to Django 1.9 and tried updating some of my model fields to use the built-in JSONField (Im using PostgreSQL 9.4.5). As I was trying to create and update my objects fields, I came a…

Using Tweepy to search for tweets with API 1.1

Ive been trying to get tweepy to search for a sring without success for the past 3 hours. I keep getting replied it should use api 1.1. I thought that was implemented... because I can post with tweepy.…

Retrieving my own data via FaceBook API

I am building a website for a comedy group which uses Facebook as one of their marketing platforms; one of the requirements for the new site is to display all of their Facebook events on a calendar.Cur…

Python -- Optimize system of inequalities

I am working on a program in Python in which a small part involves optimizing a system of equations / inequalities. Ideally, I would have wanted to do as can be done in Modelica, write out the equation…

Pandas side-by-side stacked bar plot

I want to create a stacked bar plot of the titanic dataset. The plot needs to group by "Pclass", "Sex" and "Survived". I have managed to do this with a lot of tedious nump…

Turn off list reflection in Numba

Im trying to accelerate my code using Numba. One of the arguments Im passing into the function is a mutable list of lists. When I try changing one of the sublists, I get this error: Failed in nopython …

Identifying large bodies of text via BeautifulSoup or other python based extractors

Given some random news article, I want to write a web crawler to find the largest body of text present, and extract it. The intention is to extract the physical news article on the page. The original p…

Pandas reindex and interpolate time series efficiently (reindex drops data)

Suppose I wish to re-index, with linear interpolation, a time series to a pre-defined index, where none of the index values are shared between old and new index. For example# index is all precise times…

How do you set the box width in a plotly box in python?

I currently have the following;y = time_h time_box = Box(y=y,name=Time (hours),boxmean=True,marker=Marker(color=green),boxpoints=all,jitter=0.5,pointpos=-2.0 ) layout = Layout(title=Time Box, ) fig = F…

how do you install django older version using easy_install?

I just broke my environment because of django 1.3. None of my sites are able to run. So, i decided to use virtualenv to set virtual environment with different python version as well as django.But, seem…