Gymnasium sampling

For the Frozenlake, Cliffwalking, and Taxi models, we were able to access the internal state of the gym environments to convert it to an accurate stormvogel model. However, this is not the case for arbitrary gym environments, hence we can use sampling to get an approximation of the gym envrironment in stormvogel. In this notebook we give some example usages of sample_gym from stormvogel.extensions. Note that sampling is actually quite fast, but the visualization gets slow quickly when the amount of states increases.

FrozenLake

Since FrozenLake does not have too many states, and it is fully observable, we are actually very likely to get the correct model if we use enough samples. If you lower the sample rate, you will observe that at some point, transitions and states will disappear. You can also enable is_slippery. You will then get an approximation of the FrozenLake with slipping ice.

[1]:
from stormvogel import *
import gymnasium as gym
env = gym.make("FrozenLake-v1", render_mode="rgb_array", is_slippery=False)
model = extensions.sample_gym(env, no_samples=200)
print(model.summary())
show(model, layout=Layout("layouts/frozenlake.json"))
ModelType.MDP model with name `Gymnasium sample from FrozenLake-v1 with 200 samples of max length 20`, 16 states, 5 actions, and 18 distinct labels.
Test request failed. See 'Communication server remark' in docs. Disable warning by use_server=False.
Network
[1]:
<stormvogel.visualization.Visualization at 0x7f1d8a046a50>

Blackjack

Blackjack is not fully observable, hence states that are not identical in the gymnasium model are merged in the sample model.

[2]:
import gymnasium as gym
from stormvogel import *
env = gym.make("Blackjack-v1", render_mode="rgb_array")
model = extensions.sample_gym(env, no_samples=50)
print(model.summary())
show(model)
ModelType.MDP model with name `Gymnasium sample from Blackjack-v1 with 50 samples of max length 20`, 88 states, 3 actions, and 63 distinct labels.
Network
[2]:
<stormvogel.visualization.Visualization at 0x7f1d89ec9090>

Acrobot

We can even sample continuous environments and treat them like MDPs. In this particular case, all numbers are rounded to 1 decimal (in convert_obs). The more accurate you want to be, the more states are required!

[3]:
env = gym.make('Acrobot-v1', render_mode="rgb_array")

def convert_obs(xs):
    return tuple([round(float(x),1) for x in xs])

model = extensions.sample_gym(env, no_samples=10, sample_length=5, convert_obs=convert_obs, max_size=10000)
print(model.summary())
show(model)
ModelType.MDP model with name `Gymnasium sample from Acrobot-v1 with 10 samples of max length 5`, 57 states, 4 actions, and 57 distinct labels.
Network
[3]:
<stormvogel.visualization.Visualization at 0x7f1d89eca0d0>
[ ]: