Interactive version: Binder badge.

Working with Simulators

The simulators in stormpy are meant to mimic access to unknown models, but they can also be used to explore the model.

All simulators implement the abstract class stormpy.simulator.Simulator.

The simulators differ in the model representation they use in the background and in the representation of the states and actions exposed to the user. We will go through some options by example!

[1]:
import stormpy
import stormpy.examples
import stormpy.examples.files
import stormpy.simulator

Model-based simulation

We first start with an explicit model-based simulation. This means that we have a model of the DTMC in memory. This is fast and convenient if the model is available, but limits the size of models that we support.

DTMCs

We first discuss the interface for DTMCs, without any nondeterminism.

Explicit state-representations

After importing some parts of stormpy as above, we start with creating a model, in this case a DTMC:

[2]:
path = stormpy.examples.files.prism_dtmc_die
prism_program = stormpy.parse_prism_program(path)
model = stormpy.build_model(prism_program)

simulator = stormpy.simulator.create_simulator(model, seed=42)

Let us simulate a path.

[3]:
simulator.restart()
[3]:
(0, [0.0], {'init'})
[4]:
simulator.step()
[4]:
(2, [1.0], set())
[5]:
simulator.step()
[5]:
(5, [1.0], set())
[6]:
simulator.step()
[6]:
(11, [1.0], {'done', 'five'})

We start the simulator by restarting. We then do 3 steps. Every step returns a triple (state, reward, labels). In particular, the simulation above reflects a path s0, s2, s5, s11. Taking the transitions inbetween yields the reward as shown above. While states s2 and s5 are not labelled, state s0 is labelled with init and state s11 is labelled with done and five. Indeed we can check this information on the model that we used for the simulator:

[7]:
model.labeling.get_labels_of_state(11)
[7]:
{'done', 'five'}

We can continue sampling.

[8]:
simulator.step()
[8]:
(11, [0.0], {'done', 'five'})
[9]:
simulator.step()
[9]:
(11, [0.0], {'done', 'five'})

Indeed, we are not leaving the state. In this case, we can never leave the state as state s11 is absorbing. The simulator detects and exposes this information via simulator.is_done()

[10]:
simulator.is_done()
[10]:
True

We can sample more paths, yielding (potentially) different final states:

[11]:
simulator.restart()
final_outcomes = dict()
for n in range(100):
    while not simulator.is_done():
        observation, reward, labels = simulator.step()
    if observation not in final_outcomes:
        final_outcomes[observation] = 1
    else:
        final_outcomes[observation] += 1
    simulator.restart()
final_outcomes
[11]:
{7: 21, 9: 16, 10: 18, 11: 17, 12: 16, 8: 12}

Program-level representations

We can run the same simulator but represent states symbolically, referring to the high-level description of the state rather than on its internal index. The advantage of this is that the process becomes independent of the underlying representation of the model. We first need to build the model with the required annotations.

[12]:
options = stormpy.BuilderOptions()
options.set_build_state_valuations()
model = stormpy.build_sparse_model_with_options(prism_program, options)

Then, we create simulator that uses program-level state observations.

[13]:
simulator = stormpy.simulator.create_simulator(model, seed=42)
simulator.set_observation_mode(stormpy.simulator.SimulatorObservationMode.PROGRAM_LEVEL)
[14]:
state, reward, label = simulator.restart()
str(state)
[14]:
'{\n    "d": 0,\n    "s": 0\n}'

Indeed, the state is now an object that describes the state in terms of the variables of prism program, in this case variables “s” and “d”.

We can use the simulator as before, e.g.,

[15]:
simulator.restart()
final_outcomes = dict()
print(simulator.get_reward_names())
for n in range(100):
    while not simulator.is_done():
        observation, reward, labels = simulator.step()
    if observation not in final_outcomes:
        final_outcomes[observation] = 1
    else:
        final_outcomes[observation] += 1
    simulator.restart()
print(", ".join([f"{str(k)}: {v}" for k,v in final_outcomes.items()]))
['coin_flips']
{
    "d": 5,
    "s": 7
}: 18, {
    "d": 4,
    "s": 7
}: 19, {
    "d": 2,
    "s": 7
}: 13, {
    "d": 6,
    "s": 7
}: 16, {
    "d": 3,
    "s": 7
}: 14, {
    "d": 1,
    "s": 7
}: 20

MDPs

Explicit representations

As above, we can represent states both explicitly or symbolically. We only discuss the explicit representation here. With nondeterminism, we now must resolve this nondeterminism externally. That is, the step argument now takes an argument, which we may pick randomly or in a more intelligent manner.

[16]:
import random
random.seed(23)
path = stormpy.examples.files.prism_mdp_slipgrid
prism_program = stormpy.parse_prism_program(path)

model = stormpy.build_model(prism_program)
simulator = stormpy.simulator.create_simulator(model, seed=42)
# 3 paths of at most 20 steps.
paths = []
for m in range(3):
    path = []
    state, reward, labels = simulator.restart()
    path = [f"{state}"]
    for n in range(20):
        actions = simulator.available_actions()
        select_action = random.randint(0,len(actions)-1)
        path.append(f"--act={actions[select_action]}-->")
        state, reward, labels = simulator.step(actions[select_action])
        path.append(f"{state}")
        if simulator.is_done():
            break
    paths.append(path)
for path in paths:
    print(" ".join(path))
    print("------")
0 --act=1--> 2 --act=0--> 2 --act=0--> 4 --act=2--> 1 --act=1--> 3 --act=1--> 3 --act=2--> 7 --act=2--> 3 --act=0--> 1 --act=2--> 4 --act=1--> 4 --act=2--> 4 --act=3--> 8 --act=0--> 5 --act=0--> 8 --act=3--> 12 --act=0--> 12 --act=0--> 9 --act=0--> 9 --act=1--> 5
------
0 --act=1--> 0 --act=0--> 0 --act=1--> 2 --act=1--> 0 --act=0--> 0 --act=0--> 1 --act=0--> 0 --act=1--> 2 --act=2--> 5 --act=0--> 8 --act=1--> 11 --act=2--> 7 --act=2--> 3 --act=2--> 7 --act=2--> 3 --act=2--> 3 --act=1--> 3 --act=2--> 3 --act=0--> 1 --act=1--> 3
------
0 --act=0--> 1 --act=2--> 4 --act=3--> 4 --act=3--> 4 --act=1--> 4 --act=1--> 7 --act=1--> 10 --act=2--> 10 --act=1--> 6 --act=0--> 3 --act=2--> 7 --act=0--> 4 --act=0--> 2 --act=0--> 4 --act=3--> 8 --act=0--> 5 --act=2--> 9 --act=1--> 5 --act=2--> 9 --act=1--> 9
------

In the example above, the actions are internal numbers. Often, a program gives semantically meaningful names, such as moving north, east, west and south in a grid with program variables reflecting the x and y location.

[17]:
options = stormpy.BuilderOptions()
options.set_build_choice_labels()
options.set_build_state_valuations()
model = stormpy.build_sparse_model_with_options(prism_program, options)
simulator = stormpy.simulator.create_simulator(model, seed=42)
simulator.set_action_mode(stormpy.simulator.SimulatorActionMode.GLOBAL_NAMES)
simulator.set_observation_mode(stormpy.simulator.SimulatorObservationMode.PROGRAM_LEVEL)
# 3 paths of at most 20 steps.
paths = []
for m in range(3):
    path = []
    state, reward, labels = simulator.restart()
    path = [f"({state['x']},{state['y']})"]
    for n in range(20):
        actions = simulator.available_actions()
        select_action = random.randint(0,len(actions)-1)
        path.append(f"--{actions[select_action]}-->")
        state, reward, labels = simulator.step(actions[select_action])
        path.append(f"({state['x']},{state['y']})")
        if simulator.is_done():
            break
    paths.append(path)
for path in paths:
    print(" ".join(path))
(1,1) --west--> (1,2) --south--> (1,2) --east--> (1,2) --east--> (1,1) --west--> (1,2) --east--> (1,1) --south--> (2,1) --south--> (2,1) --north--> (1,1) --south--> (2,1) --west--> (2,1) --north--> (2,1) --west--> (2,2) --west--> (2,2) --south--> (3,2) --east--> (3,2) --south--> (4,2) --west--> (4,2) --west--> (4,2) --north--> (3,2)
(1,1) --west--> (1,1) --south--> (1,1) --west--> (1,2) --south--> (1,2) --south--> (1,2) --west--> (1,3) --east--> (1,2) --east--> (1,2) --south--> (2,2) --south--> (3,2) --south--> (4,2) --west--> (4,3) --north--> (3,3) --south--> (4,3) --west--> (4,4) --north--> (3,4) --east--> (3,3) --west--> (3,3) --north--> (2,3) --east--> (2,3)
(1,1) --south--> (2,1) --south--> (3,1) --south--> (3,1) --north--> (2,1) --north--> (1,1) --south--> (2,1) --south--> (3,1) --south--> (3,1) --west--> (3,2) --south--> (3,2) --west--> (3,3) --east--> (3,2) --south--> (3,2) --north--> (3,2) --north--> (3,2) --south--> (3,2) --south--> (4,2) --east--> (4,1) --north--> (4,1) --west--> (4,2)

Program-level simulator

We can also use a program-level simulator, which does not require putting the model into memory.

[18]:
simulator = stormpy.simulator.create_simulator(prism_program, seed=42)
[19]:
# 3 paths of at most 20 steps.
paths = []
for m in range(3):
    path = []
    state, reward, labels = simulator.restart()
    path = [f"({state['x']},{state['y']})"]
    for n in range(20):
        actions = simulator.available_actions()
        select_action = random.randint(0,len(actions)-1)
        path.append(f"--{actions[select_action]}-->")
        state, reward, labels = simulator.step(actions[select_action])
        path.append(f"({state['x']},{state['y']})")
        if simulator.is_done():
            break
    paths.append(path)
for path in paths:
    print(" ".join(path))
(1,1) --0--> (2,1) --0--> (1,1) --1--> (1,2) --1--> (1,1) --1--> (1,2) --0--> (2,2) --3--> (2,3) --0--> (1,3) --2--> (1,3) --2--> (1,4) --0--> (2,4) --2--> (2,3) --2--> (2,2) --3--> (2,2) --3--> (2,3) --2--> (2,2) --0--> (2,2) --2--> (2,2) --0--> (1,2) --0--> (2,2)
(1,1) --0--> (2,1) --1--> (2,1) --2--> (2,2) --3--> (2,2) --1--> (2,2) --1--> (3,2) --0--> (2,2) --2--> (2,1) --2--> (2,2) --0--> (2,2) --1--> (3,2) --3--> (3,3) --3--> (3,3) --2--> (3,2) --0--> (2,2) --2--> (2,2) --3--> (2,2) --0--> (1,2) --0--> (2,2) --3--> (2,3)
(1,1) --0--> (2,1) --2--> (2,2) --0--> (1,2) --0--> (2,2) --3--> (2,2) --3--> (2,3) --0--> (1,3) --1--> (1,3) --2--> (1,4) --1--> (1,4) --1--> (1,3) --2--> (1,4) --0--> (2,4) --2--> (2,3) --2--> (2,2) --2--> (2,2) --0--> (1,2) --0--> (2,2) --0--> (2,2) --3--> (2,3)