This is a more conceptual question, but I have to confess I have been dealing with it for a while.
Suppose you want to train a neural network (NN), using for instance keras. As it is recommended you perform previous to the training a normalization or standardization of the data, so, for instance, with a standardization:
x_new = (x_old - mean)/standarddev
Then, you carry on the training (model.fit
in keras) and minimize the loss function, all very nice.
Edit: In my case, I have a set of values between 200 and 400. It's a NN with 1 input, 1 output. I standardize as told, the input values AND the expected values, so the NN learns the weights and biases in a standardized way.
Now, imagine that I have a completely new dataset of values between 200 and 400 and I want to predict an output, using the NN with the previous training. You can use model.predict(x)
in keras, with x
the completely new set of values I have received, standardized (or normalized) because your NN was trained in that way. But then, what I get, after the predict
is an array of values standardized, but I want to map them to the usual range of 200 to 400. And I don't know how to do this.
I know you can carry on the training without normalizing or standardizing, but I have read that if you standardize (or normalize), with values in the range of the output of the units (neurons) (for instance, between 0 and 1 for a sigmoid), the training improves.
Thank you.
Ok, I think that I got what is your problem correctly so I will try to explain you how to deal with data normalization :
1. Assumption about distribiution of inputs and outputs : usually in neural network training - what you assume is that your data (both input and output) comes from some probability distribiutions : let's call it X for input and Y of output. There are some reasons to make this distribiution to be zero mean and with unit standard deviation during the training phase.
2. Statistical part of data normalization and recovery : because of that - you have to solve another task during training your network. This task is to estimate the mean and standard deviation of both input distribution X and output distribution Y. You are doing that by simply applying empirical mean and standard deviation to your training data.
3. Application phase - inputs : when you apply your model to new input you are also assuming that your input comes from distribiution X so you also need to standarize it to be zero mean and unit standard deviation and here is a funny part - you can use both training set and a set of new data to obtain even better estimation of mean and standard deviation of X but to avoid overfitting in validation case - you usually use the mean and standard deviation obtained during training phase to make new data standarized.
4. Application phase - outputs : this part is trickier because when you apply your network to new standarized inputs you get new outputs from Y* ~ (Y - mean'(Y)) / sd'(Y) where mean'(Y) and sd'(Y) are estimation of mean and standard deviation obtained empirically from your training set and Y is original distribiution of your output. It's because during your training set you feed your optimizer with output data from this distribiution. So to make your outputs to be restandarized you need to apply transformation: Y* * sd'(Y) + mean'(Y). which is reverse to standarization transformation.
SUMMARY:
Your training and application phase looks following :
- You are obtaining statistics needed for both training phase and application phase by computing empirical mean and standard deviation of your training inputs (mean'(X) and sd'(X) and empirical mean and standard deviation of your outputs (mean'(Y) and sd'(Y)). It's important to store them because they will be needed in application phase.
- You standarize your both input and output data to be zero mean and unit standard deviation and train your model on them.
- During application phase you standarize your inputs by subtracting it by stored mean'(X) and dividing by stored sd'(X) to obtain new output Y*
- You destandarize your outputs using stored mean'(Y) and sd'(Y) - obtained during training phase - by transformation (Y* * sd'(Y) + mean'(Y).
I hope that this answer will solve your problem and leave you with no doubts about details of standarization and destandarization of your data :)