home > kero > Documentation

This function splits a data frame to its clean part and defective part, so that we can use the clean part for processing or analysis.

def data_sieve(dataframe):
  return cleanD, crippD, origD


dataframe (panda data frame) Panda dataframe
return cleanD


 clean_data object. This object has property “clean_df“.

cleanD.clean_df is the original data frame with all defective rows removed.

return crippD


crippled_data object. This object has property “crippled_df“.

crippD.crippled_df is a pandas data frame made of only the defective rows of the original data frame.

return origD


original_data object. The input dataframe df for data_sieve() will be stored as the property “df ” of this cleanD object.



Example usage 1

import kero.DataHandler.DataTransform as dt
import kero.DataHandler.RandomDataFrame as RDF
import numpy as np

rdf = RDF.RandomDataFrame()
col1 = {"column_name": "first", "items": [1, 2, 3]}
itemlist = list(np.linspace(10, 20, 48))
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": ["gg", "not"]}
col4 = {"column_name": "fourth", "items": ["my", "sg", "id", "jp", "us", "bf"]}

df, _ = rdf.initiate_random_table(20, col1, col2, col3, col4, panda=True, with_unique_ID=None)
df = rdf.crepify_table(df, 0.05)

cleanD, crippD, origD = dt.data_sieve(df)
print("\nclean part")
print('\ncrippled part:')

kero version: 0.1 and above