generate_level_1_report()

home > kero > Documentation

kero.DataHandler.DataVisual.py.

class visual_lab():
  def generate_level_1_report(self, label_name=""):
    return

The function generates text documents listing the properties of our data frame, the clean version of it, and the defective parts of it.

Example Usage 1

First, we create a random data frame.

import kero.DataHandler.Debuggers as dhdeb
import pandas as pd
import kero.DataHandler.DataVisual as dv

rdf = dhdeb.check_initiate_random_table(csv_name="check_generate_level_1_report.csv",rate=0.09)
df = pd.read_csv(r"check_generate_level_1_report.csv")

Now we use the functions and print some checking as well.

vlab = dv.visual_lab()
vlab.prepare_for_visual(df)
#### for checking ####
print(vlab.df)
print("\nclean df:\n")
print(vlab.cleanD.clean_df)
print("\ncrippled df:\n")
print(vlab.crippledD.crippled_df)
print("\n\n")
#######################
vlab.generate_level_1_report(label_name="mydata")

Notice that vlab stores the original, clean and defective data as its properties and these data can be accessed through the methods above, for example, vlab.df, which is the original data.

The output is for example the following.

mydatalevel1_clean.txt

Amongst the information, we can glean some interesting features. Note that in column 2, the unique list “gg”” and “not”. Now look at the next part, mydatalevel1_original.txt, column 2 as well. There is “nan” before cleaning processes.

inspect_original_data...

--> column 0 (first)  : [1.0, 3.0, 1.0, 3.0, 3.0, 2.0, 1.0, 2.0, 2.0, 2.0, 3.0]
    length = 11
    unique list = [1.0, 2.0, 3.0]
    size of unique list (include various Nan) = 3
    type list = [<class 'numpy.float64'>]


--> column 1 (second)  : [16.170212765957444, 17.65957446808511, 19.78723404255319, 14.680851063829786, 18.08510638297872, 12.978723404255321, 10.212765957446807, 10.425531914893618, 11.48936170212766, 11.48936170212766, 17.23404255319149]
    length = 11
    unique list = [10.212765957446807, 10.425531914893618, 12.978723404255321, 11.48936170212766, 14.680851063829786, 16.170212765957444, 17.65957446808511, 18.08510638297872, 19.78723404255319, 17.23404255319149]
    size of unique list (include various Nan) = 10
    type list = [<class 'numpy.float64'>]


--> column 2 (third)  : ['not', 'not', 'not', 'gg', 'not', 'not', 'not', 'not', 'gg', 'not', 'not']
    length = 11
    unique list = ['not', 'gg']
    size of unique list (include various Nan) = 2
    type list = [<class 'str'>]


--> column 3 (fourth)  : ['us', 'bf', 'sg', 'sg', 'bf', 'sg', 'bf', 'bf', 'my', 'id', 'id']
    length = 11
    unique list = ['us', 'id', 'sg', 'bf', 'my']
    size of unique list (include various Nan) = 5
    type list = [<class 'str'>]

 

mydatalevel1_original.txt

inspect_original_data...

--> column 0 (first)  : [1.0, nan, 3.0, 1.0, 3.0, 3.0, 1.0, 2.0, 3.0, 1.0, 3.0, 2.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 3.0]
    length = 20
    unique list = [nan, 1.0, 2.0, 3.0]
    size of unique list (include various Nan) = 4
    type list = [<class 'numpy.float64'>]

--> column 1 (second)  : [16.170212765957444, 12.553191489361698, 17.65957446808511, 19.78723404255319, 14.680851063829786, 18.08510638297872, 12.127659574468085, 16.80851063829787, 14.468085106382981, nan, 10.425531914893618, 12.978723404255321, 10.212765957446807, 10.425531914893618, 11.48936170212766, 17.02127659574468, 11.48936170212766, nan, nan, 17.23404255319149]
    length = 20
    unique list = [nan, nan, nan, 10.425531914893618, 10.212765957446807, 12.553191489361698, 12.127659574468085, 14.680851063829786, 14.468085106382981, 16.170212765957444, 17.65957446808511, 18.08510638297872, 19.78723404255319, 16.80851063829787, 12.978723404255321, 17.02127659574468, 17.23404255319149, 11.48936170212766]
    size of unique list (include various Nan) = 18
    type list = [<class 'numpy.float64'>]


--> column 2 (third)  : ['not', 'not', 'not', 'not', 'gg', 'not', nan, 'not', nan, 'gg', 'not', 'not', 'not', 'not', 'gg', 'gg', 'not', 'not', 'not', 'not']
    length = 20
    unique list = [nan, 'not', 'gg']
    size of unique list (include various Nan) = 3
    type list = [<class 'str'>, <class 'float'>]

--> column 3 (fourth)  : ['us', 'bf', 'bf', 'sg', 'sg', 'bf', 'bf', nan, 'us', 'jp', nan, 'sg', 'bf', 'bf', 'my', nan, 'id', 'sg', 'id', 'id']
    length = 20
    unique list = [nan, 'sg', 'my', 'us', 'id', 'jp', 'bf']
    size of unique list (include various Nan) = 7
    type list = [<class 'str'>, <class 'float'>]

kero version: 0.1 and above