crepify_table()

home > kero > Documentation

This function takes in a panda dataframe and makes some data points invalid by removing some data. It will save the output into a csv file.

kero.DataHandler.RandomDataFrame.py

class RandomDataFrame:
  def crepify_table(self, dataframe, rate=0.01, column_index_exception=None):
    return df
dataframe panda dataframe
rate rate: fraction between 0 to 1 or None

– default value =0.01

– if set to None, nothing will happen.

otherwise, for each column whose index is not specified in column_index_exception the column will be punctuated with blanks at probabilistic rate.

column_index
_exception
index of the column not to be punctured- e.g. [0, 1]
 return df  df is a panda data frame

Example usage 1

import numpy as np
import kero.DataHandler.RandomDataFrame as RDF

rdf = RDF.RandomDataFrame()
output_label = "classification"
csv_name = "check_table_defect_index.csv"
rate = 0.01
with_unique_ID = True

col1 = {"column_name": "first", "items": [1, 2, 3]}
itemlist = list(np.linspace(10, 20, 48))
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": ["gg", "not"]}
col4 = {"column_name": "fourth", "items": ["my", "sg", "id", "jp", "us", "bf"]}

if output_label is not None:
    if output_label=="classification":
        col_out={"column_name": "result", "items": ["classA","classB","classC"]}
        rdf.initiate_random_table(20, col1, col2, col3, col4,col_out, panda=True)
else:
    rdf.initiate_random_table(20, col1, col2, col3, col4, panda=True,with_unique_ID=with_unique_ID)

Up to here we have only created a data frame. Do not worry about output_label: in this example, the output_label tells the code that it is a classification rather than regression problem. Now we puncture all columns of the data frame. Notice that col1 to col4 specifies the column name and the possible values that each column can take. Each is a dictionary.

rdf.crepify_table(rdf.clean_df, rate=rate)
try:
    rdf.crepified_df.to_csv(csv_name, index=False)
except:
    print("check_initiate_random_table. Error.")

Here is a snapshot of csv file created from this process.

defecttable

kero version: 0.1 and above