generate_level_2_report()

home > kero > Documentation

kero.DataHandler.DataVisual.py.

class visual_lab():
  generate_level_2_report(self, label_name="", settings=None):
    return
label_name String, name of the report files and images plotted
settings Dictionary, {key_dep_1:dict1, …}. Each dict1 is a dictionary {key_indep_1: mode1, …}, where mode is a string for the mode of analysis.

Each key_dep_1 is the dependent variable to be analyzed. against all independent variables key_indep1.

mode1:

(1) None: a table is created. The rows [r1,r2,…] correspond to unique elements in key_indep1_, columns [c1,c2,…] to unique elements in key_dep_1. Each cell is the number of data with (r1,c1).

(2) ‘plot’: An image plotting the scatter plot of dep_key_1 vs inkey_dep_1 will be saved.

If the settings=None, then all columns will be analyzed against the last column under the mode=None.

The function generates text documents and plots to observe the trend between variables or features.

Example Usage 1

import pandas as pd
import kero.DataHandler.DataVisual as dv
import kero.DataHandler.Debuggers as dhdeb

rdf = dhdeb.check_initiate_random_table(csv_name="check_generate_level_2_report.csv", rate=0.09,
                                  output_label="classification", with_unique_ID=True)

The above create random data table and save it to a csv file. A preview of the table is shown below.

    first     second third fourth  result
0     3.0  12.978723    gg     us  classA
1     3.0  10.212766    gg     id  classC
2     1.0  13.617021    gg     jp  classC
3     1.0  10.000000   not     us  classA
4     3.0  11.276596    gg     jp  classC
5     2.0  20.000000    gg     my  classB

Using this function, the report rLVL2_.txt is created and the plot report_rLVL2_report_level2_$first$second.jpg is created.

df = pd.read_csv(r"check_generate_level_2_report.csv")
vlab = dv.visual_lab()
vlab.prepare_for_visual(df)
#### for checking ####
# print(vlab.df)
print("\nclean df:\n")
print(vlab.cleanD.clean_df)
print("\n\n")
#######################
# In this example, analyze against two possible dependent variables 'result' and 'first'
# against respective independent variables.
settings = {
    'result': {'first': None, 'second': None, 'third': None, 'fourth': None},
    'first': {'second': 'plot'},
}
vlab.generate_level_2_report(label_name="rLVL2_", settings=settings)

First, load the data frame as df, then prepare it into visual_lab() object, and finally feed it into this function to generate the reports. In the above example, the independent variable ‘result’ is analyzed against each column ”first’, ‘second’, ‘third’ and ‘fourth’ separately, each producing a table. The table comparing ‘result’ and ‘third’ is printed in the txt file. The first table shows the number of data points with the respective values. For example, there are 5 data points whose ‘third’ property is ‘gg’, and the result ‘classC’. The next table is the same table, except written as a fraction.

     classA  classB  classC
gg        3       3       5
not       1       2       2
     classA  classB    classC
gg     0.75     0.6  0.714286
not    0.25     0.4  0.285714

In the above example, the column ‘first’ is also analyzed w.r.t. column ‘second’ using mode=”plot”. The resulting image is as the following.

rLVL2_report_level2_$first$second.jpg

kero version: 0.1 and above