Label outliers in a boxplot - Python

2024/10/6 1:42:31

I am analysing extreme weather events. My Dataframe is called df and looks like this:

|    Date    |      Qm      |
|------------|--------------|                                              
| 1993-01-01 |  4881.977061 |
| 1993-02-01 |  4024.396839 |
| 1993-03-01 |  3833.664650 |
| 1993-04-01 |  4981.192526 |
| 1993-05-01 |  6286.879798 |  
| 1993-06-01 |  6939.726070 |
| 1993-07-01 |  6492.936065 |
|    ...     |      ...     |

I want to know whether the extreme events happened in the same year as an outlier measured. Thus, I did my boxplot using seaborn:

# Qm boxplot analysisboxplot = sns.boxplot(x=df.index.month,y=df['Qm'])
plt.show()

Boxplot obtained

Now, I would like to present within the same figure the years corresponding to the outliers. Hence, label them with their date.

I have checked in multiple libraries that include boxplots, but there is no clue on how to label them.

PD: I used seaborn in this example, but any library that could help will be highly appreciated

Thanks!

Answer

You could iterate through the dataframe and compare each value against the limits for the outliers. Default these limits are 1.5 times the IQR past the low and high quartiles. For each value outside that range, you can plot the year next to it. Feel free to adapt this definition if you would like to display more or less years.

Here is some code to illustrate the idea. In the code the two last digits of the year are shown next to the position of the outlier.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as snsY = 26
df = pd.DataFrame({'Date': pd.date_range('1993-01-01', periods=12 * Y, freq='M'),'Qm': np.random.normal(np.tile(5000 + 1000 * np.sin(np.linspace(0, 2 * np.pi, 12)), Y), 1000)})
df.set_index('Date', inplace=True)
boxplot = sns.boxplot(x=df.index.month, y=df['Qm'])
month_q1 = df.groupby(df.index.month).quantile(0.25)['Qm'].to_numpy()
month_q3 = df.groupby(df.index.month).quantile(0.75)['Qm'].to_numpy()
outlier_top_lim = month_q3 + 1.5 * (month_q3 - month_q1)
outlier_bottom_lim = month_q1 - 1.5 * (month_q3 - month_q1)for row in df.itertuples():month = row[0].month - 1val = row.Qmif val > outlier_top_lim[month] or val < outlier_bottom_lim[month]:plt.text(month, val, f' {row[0].year % 100:02d}', ha='left', va='center')
plt.xlabel('Month')
plt.tight_layout()
plt.show()

sample plot

https://en.xdnf.cn/q/70422.html

Related Q&A

Matplotlib how to draw vertical line between two Y points

I have 2 y points for each x points. I can draw the plot with this code:import matplotlib.pyplot as pltx = [0, 2, 4, 6] y = [(1, 5), (1, 3), (2, 4), (2, 7)]plt.plot(x, [i for (i,j) in y], rs, markersiz…

Cythonizing fails because of unknown type name uint64_t

This may be a newbie problem. I cant cythonize a simple helloworld.pyx tutorial script while the exact same code works on linux:print("hello world")Here is the setup.py script: from distutils…

How to save changes in read-only Jupyter Notebook

I have opened a python Jupyter notebook but did not notice that it was in read-only, Not Trusted mode. How to save my changes now?Things that I have tried and did not help:File -> Make a Copy File …

How can I invoke an SQLAlchemy query with limit of 1?

I have code like this:thing = thing.query.filter_by(id=thing_id).limit(1).all()[0]all()[0] feels a bit messy and redundant in the limit(1) case. Is there a more terse (and/or otherwise optimal) way to …

How to correctly create Python feature branch releases in development? (pip and PEP-440)

I develop a Python library using Gitflow development principle and have a CI stage for unit testing and package upload to a (private) PyPI. I want to consume the uploaded package for testing purposes b…

How do I replace NA with NaN in a Pandas DataFrame?

Some columns in my DataFrame have instances of <NA> which are of type pandas._libs.missing.NAType. Id like to replace them with NaN using np.nan. I have seen questions where the instances of <…

concatenation of two or more base64 strings in python

Im tring to concatenate two strings encoded to base64 but it doesnt really work, just prints the first string in concatanation:q = base64.b64encode("StringA") print q # prints an encoded stri…

How to find shared library used by a python module?

I am debugging a python program based on pygtk and I want to make sure that the program is using the right shared library.pygtk is a GTK+ wrapper for python. I have already compiled GTK+ using jhbuild …

Python groupby doesnt work as expected [duplicate]

This question already has answers here:itertools.groupby() not grouping correctly(3 answers)Closed 5 years ago.I am trying to read an excel spreadsheet that contains some columns in following format:co…

Dask: create strictly increasing index

As is well documented, Dask creates a strictly increasing index on a per partition basis when reset_index is called, resulting in duplicate indices over the whole set. What is the best way (e.g. comput…