dacbench.envs

Package Contents

Classes

LubyEnv

Environment to learn Luby Sequence

SigmoidEnv

Environment for tracing sigmoid curves

FastDownwardEnv

Environment to control Solver Heuristics of FastDownward

ToySGDEnv

Optimize toy functions with SGD + Momentum.

GeometricEnv

Environment for tracing different curves that are orthogonal to each other

Functions

luby_gen(i)

Generator for the Luby Sequence

class dacbench.envs.LubyEnv(config)

Bases: dacbench.AbstractEnv

Environment to learn Luby Sequence

step(self, action: int)

Execute environment step

Parameters

action (int) – action to execute

Returns

state, reward, done, info

Return type

np.array, float, bool, dict

reset(self) List[int]

Resets env

Returns

Environment state

Return type

numpy.array

get_default_reward(self, _)
get_default_state(self, _)
close(self) bool

Close Env

Returns

Closing confirmation

Return type

bool

render(self, mode: str = 'human') None

Render env in human mode

Parameters

mode (str) – Execution mode

dacbench.envs.luby_gen(i)

Generator for the Luby Sequence

class dacbench.envs.SigmoidEnv(config)

Bases: dacbench.AbstractEnv

Environment for tracing sigmoid curves

_sig(self, x, scaling, inflection)

Simple sigmoid function

step(self, action: int)

Execute environment step

Parameters

action (int) – action to execute

Returns

state, reward, done, info

Return type

np.array, float, bool, dict

reset(self) List[int]

Resets env

Returns

Environment state

Return type

numpy.array

get_default_reward(self, _)
get_default_state(self, _)
close(self) bool

Close Env

Returns

Closing confirmation

Return type

bool

render(self, mode: str) None

Render env in human mode

Parameters

mode (str) – Execution mode

class dacbench.envs.FastDownwardEnv(config)

Bases: dacbench.AbstractEnv

Environment to control Solver Heuristics of FastDownward

property port(self)
property argstring(self)
static _save_div(a, b)

Helper method for safe division

Parameters
  • a (list or np.array) – values to be divided

  • b (list or np.array) – values to divide by

Returns

Division result

Return type

np.array

send_msg(self, msg: bytes)

Send message and prepend the message size

Based on comment from SO see [1] [1] https://stackoverflow.com/a/17668009

Parameters

msg (bytes) – The message as byte

recv_msg(self)

Recieve a whole message. The message has to be prepended with its total size Based on comment from SO see [1]

Returns

The message as byte

Return type

bytes

recvall(self, n: int)

Given we know the size we want to recieve, we can recieve that amount of bytes. Based on comment from SO see [1]

Parameters

n (int) – Number of bytes to expect in the data

Returns

The message as byte

Return type

bytes

_process_data(self)

Split received json into state reward and done

Returns

state, reward, done

Return type

np.array, float, bool

step(self, action: Union[int, List[int]])

Environment step

Parameters

action (Union[int, List[int]]) – Parameter(s) to apply

Returns

state, reward, done, info

Return type

np.array, float, bool, dict

reset(self)

Reset environment

Returns

State after reset

Return type

np.array

kill_connection(self)

Kill the connection

close(self)

Close Env

Returns

Closing confirmation

Return type

bool

render(self, mode: str = 'human') None

Required by gym.Env but not implemented

Parameters

mode (str) – Rendering mode

class dacbench.envs.ToySGDEnv(config)

Bases: dacbench.AbstractEnv

Optimize toy functions with SGD + Momentum.

Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value

An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]

build_objective_function(self)
get_initial_position(self)
step(self, action: Union[float, Tuple[float, float]]) Tuple[Dict[str, float], float, bool, Dict]

Take one step with SGD

Parameters

action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)

Returns

  • stateDict[str, float]

    State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”

  • reward : float

  • done : bool

  • info : Dict

Return type

Tuple[Dict[str, float], float, bool, Dict]

reset(self)

Reset environment

Returns

Environment state

Return type

np.array

render(self, **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:
Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

close(self)

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

class dacbench.envs.GeometricEnv(config)

Bases: dacbench.AbstractEnv

Environment for tracing different curves that are orthogonal to each other Use product approach: f(t,x,y,z) = X(t,x) * Y(t,y) * Z(t,z) Normalize Function Value on a Scale between 0 and 1

  • min and max value for normalization over all timesteps

get_optimal_policy(self, instance: List = None, vector_action: bool = True) List[numpy.array]

Calculates the optimal policy for an instance

Parameters
  • instance (List, optional) – instance with information about function config.

  • vector_action (bool, optional) – if True return multidim actions else return onedimensional action, by default True

Returns

List with entry for each timestep that holds all optimal values in an array or as int

Return type

List[np.array]

step(self, action: int)

Execute environment step

Parameters

action (int) – action to execute

Returns

state, reward, done, info

Return type

np.array, float, bool, dict

reset(self) List[int]

Resets env

Returns

Environment state

Return type

numpy.array

get_default_reward(self, _) float

Calculate euclidean distance between action vector and real position of Curve.

Parameters

_ (self) – ignore

Returns

Euclidean distance

Return type

float

get_default_state(self, _) numpy.array

Gather state information.

Parameters

_ – ignore param

Returns

numpy array with state information

Return type

np.array

close(self) bool

Close Env

Returns

Closing confirmation

Return type

bool

render(self, dimensions: List, absolute_path: str)

Multiplot for specific dimensions of benchmark with policy actions.

Parameters

dimensions (List) – List of dimensions that get plotted

render_3d_dimensions(self, dimensions: List, absolute_path: str)

Plot 2 Dimensions in 3D space

Parameters

dimensions (List) – List of dimensions that get plotted. Max 2

_pre_reward(self) Tuple[numpy.ndarray, List]

Prepare actions and coordinates for reward calculation.

Returns

[description]

Return type

Tuple[np.ndarray, List]