Question 1

I'm new to ML and would be grateful for any assistance provided. I've run a linear regression prediction using test set A and training set A. I saved the linear regression model and would now like to use the same model to predict a test set A target using features from test set B. Each time I run the model it throws up the error below

How can I successfully predict a test data set from features and a target with different shapes?

Input
print(testB.shape)
print(testA.shape)Output
(2480, 5)
(1315, 6)Input
saved_model = joblib.load(filename)
testB_result = saved_model.score(testB_features, testA_target)
print(testB_result)Output
ValueError: Found input variables with inconsistent numbers of samples: [1315, 2480]

Thanks again

Question 2

They are inconsistent shapes which is why the error is being thrown. Have you tried to reshape the data so one of them are same shape? From a quick look, it seems that you have more samples and one less feature in testA.

Think about it, if you have trained your model with 5 features you cannot then ask the same model to make a prediction given 6 features. You speak of using a Linear Regressor, the equation is roughly:

y  = b + w0*x0 + w1*x1 + w2*x2 + .. + wN-1*xN-1 Where { y is your output/labelN is the number of featuresb is the bias termw(i) is the ith weightx(i) is the ith feature value}

You have trained a linear regressor with 5 features, effectively producing the following

y (your output/label) = b + w0*x0 + w1*x1 + w2*x2 + w3*x3 + w4*x4

You then ask it to make a prediction given 6 features but it only knows how to deal with 5.

Aside from that issue, you also have too many samples, testB has 2480 and testA has 1315. These need to match, as the model wants to make 2480 predictions, but you only give it 1315 outputs to compare it to. How can you get a score for 1165 missing samples? Do you now see why the data has to be reshaped?

EDIT

Assuming you have datasets with an equal amount of features as discussed above, you may now look at reshaping (removing data) testB like so:

testB = testB[0:1314, :]
testB.shape
(1315, 5)

Or, if you would prefer a solution using the numpy API:

testB = np.delete(testB, np.s_[0:(len(testB)-len(testA))], axis=0)
testB.shape
(1315, 5)

Keep in mind, when doing this you slice out a number of samples. If this is important to you (which it can be) then it may be better to introduce a pre-processing step to help out with the missing values, namely imputing them like this. It is worth noting that the data you are reshaping should be shuffled (unless it is already), as you may be removing parts of the data the model should be learning about. Neglecting to do this could result in a model that may not generalise as well as you hoped.

SKlearn prediction on test dataset with different shape from training dataset shape

Related Q&A

How to eliminate suspicious barcode (like 123456) data [closed]

how to get href link from onclick function in python

Python tkinters entry.get() does not work, how can I fix it? [duplicate]

Pandas secondary y axis for boxplots

Fixing Negative Assertion for end of string

Two Sorted Arrays, sum of 2 elements equal a certain number

I cant seem to install numpy

Using slices in Python

Elasticsearch delete_by_query wrong usage

SQLAlchemy: Lost connection to MySQL server during query