choose_action_continuous_probability()

home > kero > Documentation

Given a set of actions that a member in the network is capable of taking and the probability that each action may be taken at the given time step, the function will select an output based on the probability.

kero.multib.nDnet.py

def choose_action_continuous_probability(action_set, probability_set, N=1):
  return action_out, action_index

Arguments/Return

action_set List of strings, [a1, a2, …, aN] where each action ak.
probability_set List of float, [p1, p2, …, pN] where each pk is a (possibly unnormalized) probability that action ak will occur. Normalization is done simply by pk/(p1+p2+…+pN).
N Integer, the number of actions to be chosen.

Default = 1

return action_out If N=1¸ then this is a string, one chosen out of a1, a2, … aN

If N>1, then this is a list of strings [act1, …, actN] each chosen out of a1, a2, … aN.

return action_index If N=1, integer, else if N>1, list of integers.

Same as action_out, except that rather than the string identifier, the integer index corresponding to the action is returned.

In example 2 below, the actions and corresponding indices are “x”:0, “y”:1, “z”:2, “GG”:3.

Example Usage 1

See here.

Example Usage 2

In this example, the function is called 100000 times and we can see that for each action, the ratio of the number of an action to 100000 approaches the actual probability.

import numpy as np
import kero.multib.nDnet as nd

action_set=["x","y","z","GG"]
probability_set=[0.25,0.25,0.5,1]
number_of_appearances = np.zeros(len(action_set))
# print(number_of_appearances)
ac_set,ac_ind=nd.choose_action_continuous_probability(action_set, probability_set, N=100000)

count = 0
for ac,i in zip(ac_set,ac_ind):		
	if count<10:
		print(ac,":",i)
	count = count + 1
	number_of_appearances[i] = number_of_appearances[i] + 1 

s = np.sum(number_of_appearances)
fraction_of_appearances = [x/s for x in number_of_appearances]
print("...")
print("fraction list: ",fraction_of_appearances)

p_norm = np.sum(probability_set)
probability_norm = [x/p_norm for x in probability_set]
print("probability norm list: ", probability_norm )

An example output is the following. As we use larger and larger sample size, the fraction of actions over the entire simulation will go closer to the actual probability that the each action occurs. We print the first 10 actions chosen.

x : 0
GG : 3
GG : 3
y : 1
z : 2
z : 2
GG : 3
GG : 3
GG : 3
z : 2
...
fraction list:  [0.1272, 0.12569, 0.25048, 0.49663]
probability norm list:  [0.125, 0.125, 0.25, 0.5]

kero version: 0.5.1 and above