Plotting confidence and prediction intervals with repeated entries

2024/10/12 14:17:26

I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this: enter image description here

Here is my relevant code (slightly modified from linked answer above):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import statsmodels.formula.api as smf    
from statsmodels.stats.outliers_influence import summary_tabled = {'temp': x, 'dens': y}
df = pd.DataFrame(data=d)x = df.temp
y = df.densplt.figure(figsize=(6 * 1.618, 6))
plt.scatter(x,y, s=10, alpha=0.3)
plt.xlabel('temp')
plt.ylabel('density')# points linearly spaced for predictor variable
x1 = pd.DataFrame({'temp': np.linspace(df.temp.min(), df.temp.max(), 100)})# 2nd order polynomial
poly_2 = smf.ols(formula='dens ~ 1 + temp + I(temp ** 2.0)',   data=df).fit()# this correctly plots my single 2nd-order poly best-fit line:
plt.plot(x1.temp, poly_2.predict(x1), 'g-', label='Poly n=2  $R^2$=%.2f' % poly_2.rsquared, alpha=0.9)prstd, iv_l, iv_u = wls_prediction_std(poly_2)st, data, ss2 = summary_table(poly_2, alpha=0.05)fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T# check we got the right things
print np.max(np.abs(poly_2.fittedvalues - fittedvalues))
print np.max(np.abs(iv_l - predict_ci_low))
print np.max(np.abs(iv_u - predict_ci_upp))plt.plot(x, y, 'o')
plt.plot(x, fittedvalues, '-', lw=2)
plt.plot(x, predict_ci_low, 'r--', lw=2)
plt.plot(x, predict_ci_upp, 'r--', lw=2)
plt.plot(x, predict_mean_ci_low, 'r--', lw=2)
plt.plot(x, predict_mean_ci_upp, 'r--', lw=2)

The print statements evaluate to 0.0, as expected. However, I need single lines for the polynomial best fit line, and the confidence and prediction intervals (rather than the multiple lines I currently have in my plot). Any ideas?

Update: Following first answer from @kpie, I ordered my confidence and prediction interval arrays according to temperature:

data_intervals = {'temp': x, 'predict_low': predict_ci_low, 'predict_upp': predict_ci_upp, 'conf_low': predict_mean_ci_low, 'conf_high': predict_mean_ci_upp}df_intervals = pd.DataFrame(data=data_intervals)df_intervals_sort = df_intervals.sort(columns='temp')

This achieved desired results: enter image description here

Answer

You need to order your predict values based on temperature. I think*

So to get nice curvy lines you will have to use numpy.polynomial.polynomial.polyfit This will return a list of coefficients. You will have to split the x and y data into 2 lists so it fits in the function.

You can then plot this function with:

def strPolynomialFromArray(coeffs):return("".join([str(k)+"*x**"+str(n)+"+" for n,k in enumerate(coeffs)])[0:-1])from numpy import *
from matplotlib.pyplot import *
x = linespace(-15,45,300) # your smooth line will be made of 300 smooth pieces
y = exec(strPolynomialFromArray(numpy.polynomial.polynomial.polyfit(xs,ys,degree)))
plt.plot(x , y)

You can look more into plotting smooth lines here just remember all lines are linear splines, becasue continuous curvature is irrational.

I believe that the polynomial fitting is done with least squares fitting (process described here)

Good Luck!

https://en.xdnf.cn/q/69645.html

Related Q&A

Saving and Loading of dataframe to csv results in Unnamed columns

prob in the title. exaple:x=[(a,a,c) for i in range(5)] df = DataFrame(x,columns=[col1,col2,col3]) df.to_csv(test.csv) df1 = read_csv(test.csv)Unnamed: 0 col1 col2 col3 0 0 a a c 1 …

Python: print specific character from string

How do I print a specific character from a string in Python? I am still learning and now trying to make a hangman like program. The idea is that the user enters one character, and if it is in the word…

Python AttributeError: module string has no attribute maketrans

I am receiving the below error when trying to run a command in Python 3.5.2 shell:Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)] on win32 Type "copyrig…

How to add attribute to class in python

I have: class A:a=1b=2I want to make as setattr(A,c)then all objects that I create it from class A has c attribute. i did not want to use inheritance

Number of occurrence of pair of value in dataframe

I have dataframe with following columns:Name, Surname, dateOfBirth, city, countryI am interested to find what is most common combination of name and surname and how much it occurs as well. Would be nic…

how do i dump a single sqlite3 table in python?

I would like to dump only one table but by the looks of it, there is no parameter for this. I found this example of the dump but it is for all the tables in the DB: # Convert file existing_db.db to SQL…

Django automatically create primary keys for existing database tables

I have an existing database that Im trying to access with Django. I used python manage.py inspectdb to create the models for the database. Currently Im able to import the models into the python shell h…

matplotlib.pyplot scatterplot legend from color dictionary

Im trying to make a legend with my D_id_color dictionary for my scatterplot. How can I create a legend based on these values with the actual color? #!/usr/bin/python import matplotlib.pyplot as plt f…

Numpy Array Set Difference [duplicate]

This question already has answers here:Find the set difference between two large arrays (matrices) in Python(3 answers)Closed 7 years ago.I have two numpy arrays that have overlapping rows:import numpy…

Pylint not working within Spyder

Ive installed Anaconda on a Windows computer and Spyder works fine, but running pylint through the Static Code Analysis feature gives an error. Pylint was installed through Conda. Note: Error in Spyder…