Functionality through Wrappers

In order to comfortably provide additional functionality to environments without changing the interface, we can use so-called wrappers. They execute environment resets and steps internally, but can either alter the environment behavior (e.g. by adding noise) or record information about the environment. To wrap an existing environment is simple:

from dacbench.wrappers import PerformanceTrackingWrapper

wrapped_env = PerformanceTrackingWrapper(env)

The provided environments for tracking performance, state and action information are designed to be used with DACBench’s logging functionality.

class dacbench.wrappers.ActionFrequencyWrapper(env, action_interval=None, logger=None)

Wrapper to action frequency. Includes interval mode that returns frequencies in lists of len(interval) instead of one long list.

get_actions()

Get state progression

Returns: all states or all states and interval sorted states
Return type: np.array or np.array, np.array

render_action_tracking()

Render action progression

Returns: RBG data of action tracking
Return type: np.array

step(action)

Execute environment step and record state

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.EpisodeTimeWrapper(env, time_interval=None, logger=None)

Wrapper to track time spent per episode. Includes interval mode that returns times in lists of len(interval) instead of one long list.

get_times()

Get times

Returns: all times or all times and interval sorted times
Return type: np.array or np.array, np.array

render_episode_time(): Render episode times

render_step_time(): Render step times

step(action)

Execute environment step and record time

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.InstanceSamplingWrapper(env, sampling_function=None, instances=None, reset_interval=0)

Wrapper to sample a new instance at a given time point. Instances can either be sampled using a given method or a distribution infered from a given list of instances.

fit_dist(instances)

Approximate instance distribution in given instance set

Parameters: instances (List) – instance set
Returns: sampling method for new instances
Return type: method

reset()

Reset environment and use sampled instance for training

Returns: state
Return type: np.array

class dacbench.wrappers.ObservationWrapper(env)

Wrapper covert observations spaces to spaces.Box for convenience Currently only supports Dict -> Box

reset()

Execute environment step and record distance

Returns: state
Return type: np.array

step(action)

Execute environment step and record distance

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.PerformanceTrackingWrapper(env, performance_interval=None, track_instance_performance=True, logger=None)

Wrapper to track episode performance. Includes interval mode that returns performance in lists of len(interval) instead of one long list.

get_performance()

Get state performance

Returns: all states or all states and interval sorted states
Return type: np.array or np.array, np.array or np.array, dict or np.array, np.arry, dict

render_instance_performance(): Plot mean performance for each instance

render_performance(): Plot performance

step(action)

Execute environment step and record performance

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.PolicyProgressWrapper(env, compute_optimal)

Wrapper to track progress towards optimal policy. Can only be used if a way to obtain the optimal policy given an instance can be obtained

render_policy_progress(): Plot progress

step(action)

Execute environment step and record distance

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.RewardNoiseWrapper(env, noise_function=None, noise_dist='standard_normal', dist_args=None)

Wrapper to add noise to the reward signal. Noise can be sampled from a custom distribution or any distribution in numpy’s random module

add_noise(dist, args)

Make noise function from distribution name and arguments

Parameters

dist (str) – Name of distribution
args (list) – List of distribution arguments

Returns

Noise sampling function

Return type

function

step(action)

Execute environment step and add noise

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict

class dacbench.wrappers.StateTrackingWrapper(env, state_interval=None, logger=None)

Wrapper to track state changed over time Includes interval mode that returns states in lists of len(interval) instead of one long list.

get_states()

Get state progression

Returns: all states or all states and interval sorted states
Return type: np.array or np.array, np.array

render_state_tracking()

Render state progression

Returns: RBG data of state tracking
Return type: np.array

reset()

Reset environment and record starting state

Returns: state
Return type: np.array

step(action)

Execute environment step and record state

Parameters: action (int) – action to execute
Returns: state, reward, done, metainfo
Return type: np.array, float, bool, dict