How Normalize Data Mining Min Max from Mysql in Python

2024/9/21 8:04:10

This is example of my data in mysql, I use lib flashext.mysql and python 3

RT      NK    NB    SU    SK    P    TNI IK   IB     TARGET
84876   902  1192  2098  3623  169   39  133  1063   94095
79194   902  1050  2109  3606  153   39  133   806   87992
75836   902  1060  1905  3166  161   39  133   785   83987
75571   902   112  1878  3190  158   39  133   635   82618
83797  1156   134  1900  3518  218   39  133   709   91604
91648  1291   127  2225  3596  249   39  133   659   99967

The formula MinMax is

(data-min)/(max-min)*0.8+0.1

I got the code normalize data from csv

import pandas as pd
df = pd.read_csv("dataset.csv")
norm = (df - df.min()) / (df.max() - df.min() )*0.8 + 0.1

I know how to count it like this

(first data of RT - min column RT data) / (max column RT- min column RT) * 0.8 + 0.1

So does the next column

(first data of NK - min column NK data) / (max column NK- min column NK) * 0.8 + 0.1

Please help me, How to normalize data from database, it call "dataset" and normalize it and input in another table call "normalize"

Answer

Here is a SQL query that should get you started (assuming you want to calculate it per column):

   create table normalize asselect(RT - min(RT)over()) / (max(RT)over() - min(RT)over()) * 0.8 + 0.1 as RT_normfrom test;

I tested this query in sqlite3, not MySQL. It isn't necessarily optimal, but intuitively follows the formula. Notice, the over turns the min / max aggregate functions into window functions, which means they look at whole column, but the result is repeated on each row.

Todo

You would still need to:

  • send the MySQl query via Python
  • repeat the same code for each column
  • give each column a name
  • assign the resulting table to a schema (most likely)
  • handle divide by 0 in case a column max and min are equal
https://en.xdnf.cn/q/119232.html

Related Q&A

complex json file to csv in python

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them t…

python pygame - how to create a drag and drop with multiple images?

So Ive been trying to create a jigsaw puzzle using pygame in python.The only problem is that Im having trouble creating the board with multiple images that i can drag along the screen (no need to conne…

Efficiently append an element to each of the lists in a large numpy array

I have a really large numpy of array of lists, and I want to append an element to each of the arrays. I want to avoid using a loop for the sake of performance. The following syntax is not working. a=np…

How to traverse a high-order range in Python? [duplicate]

This question already has answers here:Equivalent Nested Loop Structure with Itertools(2 answers)Closed 4 years ago.In python, we can use range(x) to traverse from 0 to x-1. But what if I want to trave…

How to send eth_requestAccounts to Metamask in PyScript?

I am trying to get address from installed MetaMask on the browser. We used to do this in JS as follow:const T1 = async () => {let Address = await window.ethereum.request({method: "eth_requestAc…

Extract strings that start with ${ and end with }

Im trying to extract the strings from a file that start with ${ and ends with } using Python. I am using the code below to do so, but I dont get the expected result.My input file looks like this:Click …

Weibull distribution and the data in the same figure (with numpy and scipy) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

python: use agg with more than one customized function

I have a data frame like this.mydf = pd.DataFrame({a:[1,1,3,3],b:[np.nan,2,3,6],c:[1,3,3,9]})a b c 0 1 NaN 1 1 1 2.0 3 2 3 3.0 3 3 3 6.0 9I would like to have a resulting dataframe like…

sending multiple images using socket python get sent as one to client

I am capturing screenshots from the server, then sending it to the client, but the images get all sent as one big file to the client that keeps expanding in size. This only happens when i send from one…

What are the different methods to retrieve elements in a pandas Series?

There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.Whats the difference between them ? How do they handle missing labels/out of range…