Linear regression with tensorflow

2024/5/20 20:22:31

I trying to understand linear regression... here is script that I tried to understand:

'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''from __future__ import print_functionimport tensorflow as tf
from numpy import *
import numpy
import matplotlib.pyplot as plt
rng = numpy.random# Parameters
learning_rate = 0.0001
training_epochs = 1000
display_step = 50# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,2.827,3.465,1.65,2.904,2.42,2.94,1.3])train_X=numpy.asarray(train_X)
train_Y=numpy.asarray(train_Y)
n_samples = train_X.shape[0]# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)# Initializing the variables
init = tf.global_variables_initializer()# Launch the graph
with tf.Session() as sess:sess.run(init)# Fit all training datafor epoch in range(training_epochs):for (x, y) in zip(train_X, train_Y):sess.run(optimizer, feed_dict={X: x, Y: y})# Display logs per epoch stepif (epoch+1) % display_step == 0:c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \"W=", sess.run(W), "b=", sess.run(b))print("Optimization Finished!")training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')# Graphic displayplt.plot(train_X, train_Y, 'ro', label='Original data')plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')plt.legend()plt.show()

Question is what this part represent:

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

And why are there random float numbers?

Also could you show me some math with formals represents cost, pred, optimizer variables?

Answer

let's try to put up some intuition&sources together with the tfapproach.

General intuition:

Regression as presented here is a supervised learning problem. In it, as defined in Russel&Norvig's Artificial Intelligence, the task is:

given a training set (X, y) of m input-output pairs (x1, y1), (x2, y2), ... , (xm, ym), where each output was generated by an unknown function y = f(x), discover a function h that approximates the true function f

For that sake, the h hypothesis function combines somehow each x with the to-be-learned parameters, in order to have an output that is as close to the corresponding y as possible, and this for the whole dataset. The hope is that the resulting function will be close to f.

But how to learn this parameters? in order to be able to learn, the model has to be able to evaluate. Here comes the cost (also called loss, energy, merit...) function to play: it is a metric function that compares the output of h with the corresponding y, and penalizes big differences.

Now it should be clear what is exactly the "learning" process here: alter the parameters in order to achieve a lower value for the cost function.

Linear Regression:

The example that you are posting performs a parametric linear regression, optimized with gradient descent based on the mean squared error as cost function. Which means:

  • Parametric: The set of parameters is fixed. They are held in the exact same memory placeholders thorough the learning process.

  • Linear: The output of h is merely a linear (actually, affine) combination between the input x and your parameters. So if x and w are real-valued vectors of the same dimensionality, and b is a real number, it holds that h(x,w, b)= w.transposed()*x+b. Page 107 of the Deep Learning Book brings more quality insights and intuitions into that.

  • Cost function: Now this is the interesting part. The average squared error is a convex function. This means it has a single, global optimum, and furthermore, it can be directly found with the set of normal equations (also explained in the DLB). In the case of your example, the stochastic (and/or minibatch) gradient descent method is used: this is the preferred method when optimizing non-convex cost functions (which is the case in more advanced models like neural networks) or when your dataset has a huge dimensionality (also explained in the DLB).

  • Gradient descent: tf deals with this for you, so it is enough to say that GD minimizes the cost function by following its derivative "downwards", in small steps, until reaching a saddle point. If you totally need to know, the exact technique applied by TF is called automatic differentiation, kind of a compromise between the numeric and symbolic approaches. For convex functions like yours this point will be the global optimum, and (if your learning rate is not too big) it will always converge to it, so it doesn't matter which values you initialize your Variables with. The random initialization is necessary in more complex architectures like neural networks. There is some extra code regarding the management of the minibatches, but I won't get into that because it is not the main focus of your question.

The TensorFlow approach:

Deep Learning frameworks are nowadays about nesting lots of functions by building computational graphs (you may want to take a look at the presentation on DL frameworks that I did some weeks ago). For constructing and running the graph, TensoFlow follows a declarative style, which means that the graph has to be first completely defined and compiled, before it is deployed and executed. It is very reccommended to read this short wiki article, if you haven't yet. In this context, the setup is split in two parts:

  1. Firstly, you define your computational Graph, where you put your dataset and parameters in memory placeholders, define the hypothesis and cost functions building on them, and tell tf which optimization technique to apply.

  2. Then you run the computation in a Session and the library will be able to (re)load the data placeholders and perform the optimization.

The code:

The code of the example follows this approach closely:

  1. Define the test data X and labels Y, and prepare a placeholder in the Graph for them (which is fed in the feed_dict part).

  2. Define the 'W' and 'b' placeholders for the parameters. They have to be Variables because they will be updated during the Session.

  3. Define pred (our hypothesis) and cost as explained before.


From this, the rest of the code should be clearer. Regarding the optimizer, as I said, tf already knows how to deal with this but you may want to look into gradient descent for more details (again, the DLB is a pretty good reference for that)

Cheers! Andres


CODE EXAMPLES: GRADIENT DESCENT VS. NORMAL EQUATIONS

This small snippets generate simple multi-dimensional datasets and test both approaches. Notice that the normal equations approach doesn't require looping, and brings better results. For small dimensionality (DIMENSIONS<30k) is probably the preferred approach:

from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf####################################################################################################
### GLOBALS
####################################################################################################
DIMENSIONS = 5
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise####################################################################################################
### GRADIENT DESCENT APPROACH
####################################################################################################
# dataset globals
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset is used for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
ALPHA = 1e-8 # learning rate
LAMBDA = 0.5 # L2 regularization factor
TRAINING_STEPS = 1000# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data
# ds = normalize_data(ds)
ds = [(x, [f(x)+noise()]) for x in ds] # add labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])# define the computational graph
graph = tf.Graph()
with graph.as_default():# declare graph inputsx_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS))y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS))y_test = tf.placeholder(tf.float32, shape=(_test_size, 1))theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)])theta_0 = tf.Variable([[0.0]]) # don't forget the bias term!# forward propagationtrain_prediction = tf.matmul(x_train, theta)+theta_0test_prediction  = tf.matmul(x_test, theta) +theta_0# cost function and optimizertrain_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size)optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost)# test resultstest_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size)# run the computation
with tf.Session(graph=graph) as s:tf.initialize_all_variables().run()print("initialized"); print(theta.eval())for step in range(TRAINING_STEPS):_, train_c, test_c = s.run([optimizer, train_cost, test_cost],feed_dict={x_train: train_data, y_train: train_labels,x_test: test_data, y_test: test_labels })if (step%100==0):# it should return bias close to zero and parameters all close to 1 (see definition of f)print("\nAfter", step, "iterations:")#print("   Bias =", theta_0.eval(), ", Weights = ", theta.eval())print("   train cost =", train_c); print("   test cost =", test_c)PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval()print("Solution for parameters:\n", PARAMETERS_GRADDESC)####################################################################################################
### NORMAL EQUATIONS APPROACH
####################################################################################################
# dataset globals
DIMENSIONS = 5
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset isused for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise
# training globals
LAMBDA = 1e6 # L2 regularization factor# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])# define the computational graph
graph = tf.Graph()
with graph.as_default():# declare graph inputsx_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1))y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias!# optimumoptimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True)# run the computation: no loop needed!
with tf.Session(graph=graph) as s:tf.initialize_all_variables().run()print("initialized")opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels})PARAMETERS_NORMEQ = optprint("Solution for parameters:\n",PARAMETERS_NORMEQ)####################################################################################################
### PREDICTION AND ERROR RATE
##################################################################################################### generate test dataset
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
test_data, test_labels = zip(*ds)
# define hypothesis
h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x)
h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x)
# define cost
mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE
# make predictions!
predictions_gd = np.array([h_gd(x) for x in test_data])
predictions_ne = np.array([h_ne(x) for x in test_data])
# calculate and print total error
cost_gd = mse(predictions_gd, test_labels)
cost_ne = mse(predictions_ne, test_labels)
print("total cost with gradient descent:", cost_gd)
print("total cost with normal equations:", cost_ne)
https://en.xdnf.cn/q/73154.html

Related Q&A

Are null bytes allowed in unicode strings in PostgreSQL via Python?

Are null bytes allowed in unicode strings?I dont ask about utf8, I mean the high level object representation of a unicode string.BackgroundWe store unicode strings containing null bytes via Python in …

Why the irrelevant code made a difference?

I am thinking to make a progress bar with python in terminal. First, I have to get the width(columns) of terminal window. In python 2.7, there is no standard library can do this on Windows. I know mayb…

What values to use for FastCGI maxrequests, maxspare, minspare, maxchildren?

Im running a Django app using FastCGI and lighttpd.Can somebody explain me what I should consider when deciding what value to use for maxrequests, maxspare, minspare, maxchildren?These options are not…

How to calculate the Silhouette Score for each cluster separately in python

You can easily extract the silhouette score with 1 line of code that averages the scores for all your clusters but how do you extract each of the intermediate scores from the scikit learn implementatio…

Can I tell python to put an existing figure in a new figure?

Creating a certain plot is a lot of work, so I would like to automate this by create a function f() that returns a figure.I would like to call this function so that I can put the result in a subplot. …

django-crispy-forms have field and button on same row

I am needing to have a bootstrap PrependedText field with a button on the same row. I can get it on the same row but it shows the button before the textbox and I want it after. What am I doing wrong an…

Uploading file in python flask

I am trying to incorporate uploading a basic text/csv file on my web app which runs flask to handle http requests. I tried to follow the baby example in flasks documentation running on localhost here. …

Diagonal stacking in numpy?

So numpy has some convenience functions for combining several arrays into one, e.g. hstack and vstack. Im wondering if theres something similar but for stacking the component arrays diagonally?Say I h…

Rules Engine in C or Python [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

Draw arrows between 3 points

I am trying to draw arrows between three points in matplotlib.Lets assume we have 3 arbitrary points (A1,A2,A3) in 2d and we want to draw arrows from A1 to A2 and from A2 to A3.Some code to make it cle…