How to recover original values after a model predict in keras?

2024/11/15 0:42:14

This is a more conceptual question, but I have to confess I have been dealing with it for a while.

Suppose you want to train a neural network (NN), using for instance keras. As it is recommended you perform previous to the training a normalization or standardization of the data, so, for instance, with a standardization:

x_new = (x_old - mean)/standarddev

Then, you carry on the training (model.fit in keras) and minimize the loss function, all very nice.

Edit: In my case, I have a set of values between 200 and 400. It's a NN with 1 input, 1 output. I standardize as told, the input values AND the expected values, so the NN learns the weights and biases in a standardized way.

Now, imagine that I have a completely new dataset of values between 200 and 400 and I want to predict an output, using the NN with the previous training. You can use model.predict(x) in keras, with x the completely new set of values I have received, standardized (or normalized) because your NN was trained in that way. But then, what I get, after the predict is an array of values standardized, but I want to map them to the usual range of 200 to 400. And I don't know how to do this.

I know you can carry on the training without normalizing or standardizing, but I have read that if you standardize (or normalize), with values in the range of the output of the units (neurons) (for instance, between 0 and 1 for a sigmoid), the training improves.

Thank you.

Answer

Ok, I think that I got what is your problem correctly so I will try to explain you how to deal with data normalization :

1. Assumption about distribiution of inputs and outputs : usually in neural network training - what you assume is that your data (both input and output) comes from some probability distribiutions : let's call it X for input and Y of output. There are some reasons to make this distribiution to be zero mean and with unit standard deviation during the training phase.

2. Statistical part of data normalization and recovery : because of that - you have to solve another task during training your network. This task is to estimate the mean and standard deviation of both input distribution X and output distribution Y. You are doing that by simply applying empirical mean and standard deviation to your training data.

3. Application phase - inputs : when you apply your model to new input you are also assuming that your input comes from distribiution X so you also need to standarize it to be zero mean and unit standard deviation and here is a funny part - you can use both training set and a set of new data to obtain even better estimation of mean and standard deviation of X but to avoid overfitting in validation case - you usually use the mean and standard deviation obtained during training phase to make new data standarized.

4. Application phase - outputs : this part is trickier because when you apply your network to new standarized inputs you get new outputs from Y* ~ (Y - mean'(Y)) / sd'(Y) where mean'(Y) and sd'(Y) are estimation of mean and standard deviation obtained empirically from your training set and Y is original distribiution of your output. It's because during your training set you feed your optimizer with output data from this distribiution. So to make your outputs to be restandarized you need to apply transformation: Y* * sd'(Y) + mean'(Y). which is reverse to standarization transformation.

SUMMARY:

Your training and application phase looks following :

  1. You are obtaining statistics needed for both training phase and application phase by computing empirical mean and standard deviation of your training inputs (mean'(X) and sd'(X) and empirical mean and standard deviation of your outputs (mean'(Y) and sd'(Y)). It's important to store them because they will be needed in application phase.
  2. You standarize your both input and output data to be zero mean and unit standard deviation and train your model on them.
  3. During application phase you standarize your inputs by subtracting it by stored mean'(X) and dividing by stored sd'(X) to obtain new output Y*
  4. You destandarize your outputs using stored mean'(Y) and sd'(Y) - obtained during training phase - by transformation (Y* * sd'(Y) + mean'(Y).

I hope that this answer will solve your problem and leave you with no doubts about details of standarization and destandarization of your data :)

https://en.xdnf.cn/q/72147.html

Related Q&A

Find closest line to each point on big dataset, possibly using shapely and rtree

I have a simplified map of a city that has streets in it as linestrings and addresses as points. I need to find closest path from each point to any street line. I have a working script that does this, …

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but its not in…

Visualize TFLite graph and get intermediate values of a particular node?

I was wondering if there is a way to know the list of inputs and outputs for a particular node in tflite? I know that I can get input/outputs details, but this does not allow me to reconstruct the com…

Why do I get a pymongo.cursor.Cursor when trying to query my mongodb db via pymongo?

I have consumed a bunch of tweets in a mongodb database. I would like to query these tweets using pymongo. For example, I would like to query for screen_name. However, when I try to do this, python doe…

using dropbox as a server for my django app

I dont know if at all i make any sense, but this popped up in my mind. Can we use the 2gb free hosting of dropbox to put our django app over there and do some hacks to run our app?

Proper overloading of json encoding and decoding with Flask

I am trying to add some overloading to the Flask JSON encoder/decoder to add datetime encoding/decoding but only succeeded through a hack.from flask import Flask, flash, url_for, redirect, render_templ…

How to check a specific type of tuple or list?

Suppose, var = (x, 3)How to check if a variable is a tuple with only two elements, first being a type str and the other a type int in python? Can we do this using only one check? I want to avoid this…

Cannot import name BlockBlobService

I got the following error:from azure.storage.blob import BlockBlobService ImportError: cannot import name BlockBlobServicewhen trying to run my python project using command prompt. (The code seems to…

Legend outside the plot in Python - matplotlib

Im trying to place a rather extensive legend outside my plot in matplotlib. The legend has quite a few entries, and each entry can be quite long (but I dont know exactly how long).Obviously, thats quit…

Filter items that only occurs once in a very large list

I have a large list(over 1,000,000 items), which contains english words:tokens = ["today", "good", "computer", "people", "good", ... ]Id like to get al…