Functionality through Wrappers
In order to comfortably provide additional functionality to environments without changing the interface, we can use so-called wrappers. They execute environment resets and steps internally, but can either alter the environment behavior (e.g. by adding noise) or record information about the environment. To wrap an existing environment is simple:
from dacbench.wrappers import PerformanceTrackingWrapper
wrapped_env = PerformanceTrackingWrapper(env)
The provided environments for tracking performance, state and action information are designed to be used with DACBench’s logging functionality.
- class dacbench.wrappers.ActionFrequencyWrapper(env, action_interval=None, logger=None)
Wrapper to action frequency. Includes interval mode that returns frequencies in lists of len(interval) instead of one long list.
- get_actions()
Get state progression
- Returns
all states or all states and interval sorted states
- Return type
np.array or np.array, np.array
- render_action_tracking()
Render action progression
- Returns
RBG data of action tracking
- Return type
np.array
- step(action)
Execute environment step and record state
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.EpisodeTimeWrapper(env, time_interval=None, logger=None)
Wrapper to track time spent per episode. Includes interval mode that returns times in lists of len(interval) instead of one long list.
- get_times()
Get times
- Returns
all times or all times and interval sorted times
- Return type
np.array or np.array, np.array
- render_episode_time()
Render episode times
- render_step_time()
Render step times
- step(action)
Execute environment step and record time
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.InstanceSamplingWrapper(env, sampling_function=None, instances=None, reset_interval=0)
Wrapper to sample a new instance at a given time point. Instances can either be sampled using a given method or a distribution infered from a given list of instances.
- fit_dist(instances)
Approximate instance distribution in given instance set
- Parameters
instances (List) – instance set
- Returns
sampling method for new instances
- Return type
method
- reset()
Reset environment and use sampled instance for training
- Returns
state
- Return type
np.array
- class dacbench.wrappers.ObservationWrapper(env)
Wrapper covert observations spaces to spaces.Box for convenience Currently only supports Dict -> Box
- reset()
Execute environment step and record distance
- Returns
state
- Return type
np.array
- step(action)
Execute environment step and record distance
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.PerformanceTrackingWrapper(env, performance_interval=None, track_instance_performance=True, logger=None)
Wrapper to track episode performance. Includes interval mode that returns performance in lists of len(interval) instead of one long list.
- get_performance()
Get state performance
- Returns
all states or all states and interval sorted states
- Return type
np.array or np.array, np.array or np.array, dict or np.array, np.arry, dict
- render_instance_performance()
Plot mean performance for each instance
- render_performance()
Plot performance
- step(action)
Execute environment step and record performance
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.PolicyProgressWrapper(env, compute_optimal)
Wrapper to track progress towards optimal policy. Can only be used if a way to obtain the optimal policy given an instance can be obtained
- render_policy_progress()
Plot progress
- step(action)
Execute environment step and record distance
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.RewardNoiseWrapper(env, noise_function=None, noise_dist='standard_normal', dist_args=None)
Wrapper to add noise to the reward signal. Noise can be sampled from a custom distribution or any distribution in numpy’s random module
- add_noise(dist, args)
Make noise function from distribution name and arguments
- Parameters
dist (str) – Name of distribution
args (list) – List of distribution arguments
- Returns
Noise sampling function
- Return type
function
- step(action)
Execute environment step and add noise
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict
- class dacbench.wrappers.StateTrackingWrapper(env, state_interval=None, logger=None)
Wrapper to track state changed over time Includes interval mode that returns states in lists of len(interval) instead of one long list.
- get_states()
Get state progression
- Returns
all states or all states and interval sorted states
- Return type
np.array or np.array, np.array
- render_state_tracking()
Render state progression
- Returns
RBG data of state tracking
- Return type
np.array
- reset()
Reset environment and record starting state
- Returns
state
- Return type
np.array
- step(action)
Execute environment step and record state
- Parameters
action (int) – action to execute
- Returns
state, reward, done, metainfo
- Return type
np.array, float, bool, dict