Functionality through Wrappers

In order to comfortably provide additional functionality to environments without changing the interface, we can use so-called wrappers. They execute environment resets and steps internally, but can either alter the environment behavior (e.g. by adding noise) or record information about the environment. To wrap an existing environment is simple:

from dacbench.wrappers import PerformanceTrackingWrapper

wrapped_env = PerformanceTrackingWrapper(env)

The provided environments for tracking performance, state and action information are designed to be used with DACBench’s logging functionality.

class dacbench.wrappers.ActionFrequencyWrapper(env, action_interval=None, logger=None)

Wrapper to action frequency. Includes interval mode that returns frequencies in lists of len(interval) instead of one long list.

get_actions()

Get state progression

Returns

all states or all states and interval sorted states

Return type

np.array or np.array, np.array

render_action_tracking()

Render action progression

Returns

RBG data of action tracking

Return type

np.array

step(action)

Execute environment step and record state

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.EpisodeTimeWrapper(env, time_interval=None, logger=None)

Wrapper to track time spent per episode. Includes interval mode that returns times in lists of len(interval) instead of one long list.

get_times()

Get times

Returns

all times or all times and interval sorted times

Return type

np.array or np.array, np.array

render_episode_time()

Render episode times

render_step_time()

Render step times

step(action)

Execute environment step and record time

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.InstanceSamplingWrapper(env, sampling_function=None, instances=None, reset_interval=0)

Wrapper to sample a new instance at a given time point. Instances can either be sampled using a given method or a distribution infered from a given list of instances.

fit_dist(instances)

Approximate instance distribution in given instance set

Parameters

instances (List) – instance set

Returns

sampling method for new instances

Return type

method

reset()

Reset environment and use sampled instance for training

Returns

state

Return type

np.array

class dacbench.wrappers.ObservationWrapper(env)

Wrapper covert observations spaces to spaces.Box for convenience Currently only supports Dict -> Box

reset()

Execute environment step and record distance

Returns

state

Return type

np.array

step(action)

Execute environment step and record distance

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.PerformanceTrackingWrapper(env, performance_interval=None, track_instance_performance=True, logger=None)

Wrapper to track episode performance. Includes interval mode that returns performance in lists of len(interval) instead of one long list.

get_performance()

Get state performance

Returns

all states or all states and interval sorted states

Return type

np.array or np.array, np.array or np.array, dict or np.array, np.arry, dict

render_instance_performance()

Plot mean performance for each instance

render_performance()

Plot performance

step(action)

Execute environment step and record performance

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.PolicyProgressWrapper(env, compute_optimal)

Wrapper to track progress towards optimal policy. Can only be used if a way to obtain the optimal policy given an instance can be obtained

render_policy_progress()

Plot progress

step(action)

Execute environment step and record distance

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.RewardNoiseWrapper(env, noise_function=None, noise_dist='standard_normal', dist_args=None)

Wrapper to add noise to the reward signal. Noise can be sampled from a custom distribution or any distribution in numpy’s random module

add_noise(dist, args)

Make noise function from distribution name and arguments

Parameters
  • dist (str) – Name of distribution

  • args (list) – List of distribution arguments

Returns

Noise sampling function

Return type

function

step(action)

Execute environment step and add noise

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict

class dacbench.wrappers.StateTrackingWrapper(env, state_interval=None, logger=None)

Wrapper to track state changed over time Includes interval mode that returns states in lists of len(interval) instead of one long list.

get_states()

Get state progression

Returns

all states or all states and interval sorted states

Return type

np.array or np.array, np.array

render_state_tracking()

Render state progression

Returns

RBG data of state tracking

Return type

np.array

reset()

Reset environment and record starting state

Returns

state

Return type

np.array

step(action)

Execute environment step and record state

Parameters

action (int) – action to execute

Returns

state, reward, done, metainfo

Return type

np.array, float, bool, dict