repair_table

home > kero > Documentation

kero.DataHandler.DataTransform.py

class original_data:
  def repair_table(self, repair_command_settings=None): 
    return
repair_command_settings Dictionary. The key specifies the name of the column in the data frame to repair, and the value specifies the mode by which we repair the column. See the usage example below and the details of the mode here.

The object original_data will store the original data frame, possibly with defects. We call this function to create the repaired version of the data frame as the property of original_data object, repaired_df.

Example Usage 1

First, we create a randomly generated table with unique ID.

import numpy as np
import pandas as pd
import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataTransform as dt

rdf = RDF.RandomDataFrame()
# col0 : NOTE THAT IN THIS EXAMPLE we have column for unique ID
col1 = {"column_name": "first", "items": [1, 2, 3]}
itemlist = list(np.linspace(10, 20, 48))
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": ["gg", "not"]}
col4 = {"column_name": "fourth", "items": ["my", "sg", "id", "jp", "us", "bf"]}
col_out = {"column_name": "result", "items": ["classA", "classB", "classC"]}
rdf.initiate_random_table(20, col1, col2, col3, col4, col_out, panda=True, with_unique_ID="person")
rdf.crepify_table(rdf.clean_df, rate=0.1, column_index_exception=[0])  # do not crepify column 0
rdf.crepified_df.to_csv("check_repair_single_column.csv", index=False)

df = pd.read_csv(r"check_repair_single_column.csv")
cleanD, _, origD = dt.data_sieve(df)  # cleanD, crippD, origD

The example table randomly generated is as shown.

           ID  first     second third fourth  result
0    person0    2.0        NaN   not     us  classB
1    person1    2.0        NaN   not     sg  classC
2    person2    1.0  18.936170    gg     my  classC
3    person3    1.0        NaN   not     id  classC
4    person4    1.0  13.617021    gg     sg  classC
5    person5    2.0  13.617021   not     bf  classB
6    person6    1.0  13.404255    gg     my  classA
7    person7    1.0  19.148936   not     bf  classB
8    person8    2.0        NaN    gg     id  classB
9    person9    3.0  13.829787    gg    NaN  classB
10  person10    1.0  12.127660   NaN     id  classC
11  person11    3.0  18.510638   not     sg  classA
12  person12    1.0  19.148936   not     us     NaN
13  person13    1.0  16.595745    gg     sg  classC
14  person14    2.0  15.957447   not     sg  classC
15  person15    3.0  12.978723   not     bf  classB
16  person16    NaN  16.170213   not     my  classA
17  person17    1.0  18.510638    gg     bf  classB
18  person18    1.0        NaN   not     jp  classC
19  person19    1.0  18.723404    gg     my  classB

In the following, we show the repaired table, i.e. the table with all the defective portions replaced according to the commands we specify in the settings. For a more detailed information, see repair_single_column, which is the function called by this repair table function for each column.

# Repair Choices :
# 1. mean
# 2. mean_floor
# 3. mean_ceil
# 4. max_occuring
# 5. min_occuring
# 6. mid_occuring
repair_setting = {
"ID": None,
"first": "mean_floor",
"second": "mean",
"third": "max_occuring",
"fourth": "max_occuring",
"result": "mid_occuring"
}
origD.initialize_dataframe_repair()
print(origD.original_df, "\n\nCOMPARE: repaired\n")
origD.repair_table(repair_command_settings=repair_setting)
print(origD.repaired_df)

The repaired table is the following.

COMPARE: repaired

          ID  first     second third fourth  result
0    person0    2.0  16.085106   not     us  classB
1    person1    2.0  16.085106   not     sg  classC
2    person2    1.0  18.936170    gg     my  classC
3    person3    1.0  16.085106   not     id  classC
4    person4    1.0  13.617021    gg     sg  classC
5    person5    2.0  13.617021   not     bf  classB
6    person6    1.0  13.404255    gg     my  classA
7    person7    1.0  19.148936   not     bf  classB
8    person8    2.0  16.085106    gg     id  classB
9    person9    3.0  13.829787    gg     sg  classB
10  person10    1.0  12.127660   not     id  classC
11  person11    3.0  18.510638   not     sg  classA
12  person12    1.0  19.148936   not     us  classA
13  person13    1.0  16.595745    gg     sg  classC
14  person14    2.0  15.957447   not     sg  classC
15  person15    3.0  12.978723   not     bf  classB
16  person16    1.0  16.170213   not     my  classA
17  person17    1.0  18.510638    gg     bf  classB
18  person18    1.0  16.085106   not     jp  classC
19  person19    1.0  18.723404    gg     my  classB

kero version: 0.1 and above