stormvogel.extensions.gym_sampling

Functions

sample_gym(→ Tuple)

Sample the gym environment. In reality, gym environments are POMDPs, and gymnasium only allows us to access the observation.

sample_to_stormvogel(→ stormvogel.model.Model)

Create a Stormvogel mdp from a sampling (see sample_gym to obtain a sample from gym).

sample_gym_to_stormvogel(env[, no_samples, ...])

Sample the gym environment and convert it to a Stormvogel MDP.

Module Contents

stormvogel.extensions.gym_sampling.sample_gym(env: gymnasium.Env, no_samples: int = 10, sample_length: int = 20, gymnasium_scheduler: Callable[[Any], int] | None = None, convert_obs: Callable[[Any], Any] = lambda x: ...) Tuple

Sample the gym environment. In reality, gym environments are POMDPs, and gymnasium only allows us to access the observation. States that are different in gym, but have the same observation and termination will be considered the same state in the result.

Args:

env (gym.Env): Gymnasium env. no_samples (int): Total number of samples (starting at an initial state).

To resolve multiple initial states, a new, single initial state is added if necessary.

sample_length (int): The maximum length of a single sample. gymnasium_scheduler (Callable[[any], int] | None): A function from states to action numbers. convert_obs (Callable[[any], any]): Converts the observations to a hashable type. You can also apply rounding here.

Returns:

A 4-tuple consiting of four defaultdicts and one integer. * initial_states (defaultdict[state, int]): The initial state in gym may be non-deterministic. This maps the initial states to the amount of times they were observed as the initial state. * states (defaultdict[state, int]): Maps states to the amount of times they were observed. * transition_counts (defaultdict[(state,action), defaultdict[state, int]]): Counts how many times the transition between this state-action pair and state was observed. * transition_samples (defaultdict[(state,action), int]): Counts how many times this state-action pair was observed. * reward_sums (defaultdict[(state,action), int]): The sum of the rewards for this state-action pair. * no_actions (int): The number of different actions observed.

stormvogel.extensions.gym_sampling.sample_to_stormvogel(initial_states: collections.defaultdict[Any, int], transition_counts: collections.defaultdict[Tuple[Any, Any], collections.defaultdict[Any, int]], transition_samples: collections.defaultdict[Tuple[Any, Any], int], reward_sums: collections.defaultdict[Tuple[Any, Any], int], no_actions: int, no_samples: int, max_size: int = 10000) stormvogel.model.Model

Create a Stormvogel mdp from a sampling (see sample_gym to obtain a sample from gym). Probablities are frequentist estimates. Their accuracy depends on how often each “state” is visited.

Args:
initial_states (defaultdict[state, int]): The initial state in gym may be non-deterministic.

This maps the initial states to the amount of times they were observed as the initial state.

transition_counts (defaultdict[(state,action), defaultdict[state, int]]):

Counts how many times the transition between this state-action pair and state was observed.

transition_samples (defaultdict[(state,action), int]): Counts how many times this state-action pair was observed. reward_sums (defaultdict[(state,action), int]): The sum of the rewards for this state-action pair. no_actions (int): The number of different actions observed. no_samples (int): The number of samples that were used to obtain this sampling. max_size (int): The maximum number of states in the resulting model. Defaults to 10000.

stormvogel.extensions.gym_sampling.sample_gym_to_stormvogel(env: gymnasium.Env, no_samples: int = 10, sample_length: int = 20, gymnasium_scheduler: Callable[[Any], int] | None = None, convert_obs: Callable[[Any], Any] = lambda x: ..., max_size: int = 10000)

Sample the gym environment and convert it to a Stormvogel MDP. In reality, gym environments are POMDPs, and gymnasium only allows us to access the observation. The result is an MDP where states with the same observations (and termination) are lumped together. Probablities are frequentist estimates. Their accuracy depends on how often each “state” is visited.

Args:

env (gym.Env): Gymnasium env. no_samples (int): Total number of samples (starting at an initial state).

To resolve multiple initial states, a new, single initial state is added if necessary.

sample_length (int): The maximum length of a single sample. gymnasium_scheduler (Callable[[any], int] | None): A function from states to action numbers. convert_obs (Callable[[any], any]): Converts the observations to a hashable type. You can also apply rounding here. max_size (int): The maximum number of states in the resulting model. Defaults to 10000.