drop_data_points_below()

home > kero > Documentation

For a table whose entries are either zero or positive integers, we might want to convert them into fraction and then drop rows whose entries are all below certain values. See an example in the following link: multi_cause_rank().

kero.DataHandler.DataVisual.py.

class visual_lab():
  def drop_data_points_below(self, frac=0.6)
    return
frac double, frac. If a row of the converted table has entries who are all below this fraction, it will be dropped.
return None

Example Usage 1

The following code creates a table as described above.

import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataVisual as dv

rdf = RDF.RandomDataFrame()
itemlist = range(100)
col1 = {"column_name": "first", "items": itemlist}
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": itemlist}
col4 = {"column_name": "fourth", "items": itemlist}
N_row = 20
rdf.initiate_random_table(N_row, col1, col2, col3, col4, panda=True, row_name_list='aa')
print(rdf.clean_df)
      first  second  third  fourth
aa0      73      69     16      99
aa1      99       6     28      77
aa2       8      47      7      95
aa3      88      43     84      34
aa4      26       8     98      56

Then the following code shows the table converted into fractions.

numD=dv.number_data()
numD.df_number=rdf.clean_df
numD.get_frac_from_number()
numD.df_frac
 first    second     third    fourth
aa0   0.737374  0.696970  0.161616  1.000000
aa1   1.000000  0.060606  0.282828  0.777778
aa2   0.080808  0.474747  0.070707  0.959596
aa3   0.888889  0.434343  0.848485  0.343434
aa4   0.262626  0.080808  0.989899  0.565657
aa5   0.181818  0.636364  0.171717  0.292929
aa6   0.202020  0.888889  0.373737  0.090909
aa7   0.848485  0.202020  0.525253  0.686869
aa8   0.434343  0.515152  0.171717  0.505051
aa9   0.505051  0.010101  0.212121  0.606061
aa10  0.101010  0.272727  0.373737  0.959596
aa11  0.404040  0.909091  0.878788  0.313131
aa12  0.737374  0.222222  0.040404  0.868687
aa13  0.555556  0.191919  0.202020  0.676768
aa14  0.767677  0.404040  0.666667  0.191919
aa15  0.171717  0.050505  0.131313  0.919192
aa16  0.444444  0.242424  0.434343  0.212121
aa17  0.252525  0.777778  0.151515  0.282828
aa18  0.444444  0.181818  0.262626  0.222222
aa19  0.565657  0.393939  0.373737  0.969697

and we drop the rows whose fractions are all below 0.9 with this code.

numD.drop_data_points_below(frac=0.9)
print("dropping...\n",numD.df_frac_drop)

We are left with the following table. In row ‘aa15’, we can see that aa15 and fourth has a relatively strong relationship compared to other columns.

          first    second     third    fourth
aa0   0.737374  0.696970  0.161616  1.000000
aa1   1.000000  0.060606  0.282828  0.777778
aa2   0.080808  0.474747  0.070707  0.959596
aa4   0.262626  0.080808  0.989899  0.565657
aa10  0.101010  0.272727  0.373737  0.959596
aa11  0.404040  0.909091  0.878788  0.313131
aa15  0.171717  0.050505  0.131313  0.919192
aa19  0.565657  0.393939  0.373737  0.969697

kero version: 0.1 and above