`dacbench.envs`

Subpackages

dacbench.envs.policies

Submodules

Package Contents

Classes

`LubyEnv`	Environment to learn Luby Sequence
`SigmoidEnv`	Environment for tracing sigmoid curves
`FastDownwardEnv`	Environment to control Solver Heuristics of FastDownward
`ToySGDEnv`	Optimize toy functions with SGD + Momentum.
`GeometricEnv`	Environment for tracing different curves that are orthogonal to each other

Functions

luby_gen(i)

Generator for the Luby Sequence

class dacbench.envs.LubyEnv(config)

Bases: dacbench.AbstractEnv

Environment to learn Luby Sequence

step(self, action: int)

Execute environment step

Parameters: action (int) – action to execute
Returns: state, reward, done, info
Return type: np.array, float, bool, dict

reset(self) → List[int]

Resets env

Returns: Environment state
Return type: numpy.array

get_default_reward(self, _)

get_default_state(self, _)

close(self) → bool

Close Env

Returns: Closing confirmation
Return type: bool

render(self, mode: str = 'human') → None

Render env in human mode

Parameters: mode (str) – Execution mode

dacbench.envs.luby_gen(i): Generator for the Luby Sequence

class dacbench.envs.SigmoidEnv(config)

Bases: dacbench.AbstractEnv

Environment for tracing sigmoid curves

_sig(self, x, scaling, inflection): Simple sigmoid function

step(self, action: int)

Execute environment step

Parameters: action (int) – action to execute
Returns: state, reward, done, info
Return type: np.array, float, bool, dict

reset(self) → List[int]

Resets env

Returns: Environment state
Return type: numpy.array

get_default_reward(self, _)

get_default_state(self, _)

close(self) → bool

Close Env

Returns: Closing confirmation
Return type: bool

render(self, mode: str) → None

Render env in human mode

Parameters: mode (str) – Execution mode

class dacbench.envs.FastDownwardEnv(config)

Bases: dacbench.AbstractEnv

Environment to control Solver Heuristics of FastDownward

property port(self)

property argstring(self)

static _save_div(a, b)

Helper method for safe division

Parameters

a (list or np.array) – values to be divided
b (list or np.array) – values to divide by

Returns

Division result

Return type

np.array

send_msg(self, msg: bytes)

Send message and prepend the message size

Based on comment from SO see [1] [1] https://stackoverflow.com/a/17668009

Parameters: msg (bytes) – The message as byte

recv_msg(self)

Recieve a whole message. The message has to be prepended with its total size Based on comment from SO see [1]

Returns: The message as byte
Return type: bytes

recvall(self, n: int)

Given we know the size we want to recieve, we can recieve that amount of bytes. Based on comment from SO see [1]

Parameters: n (int) – Number of bytes to expect in the data
Returns: The message as byte
Return type: bytes

_process_data(self)

Split received json into state reward and done

Returns: state, reward, done
Return type: np.array, float, bool

step(self, action: Union[int, List[int]])

Environment step

Parameters: action (Union[int, List[int]]) – Parameter(s) to apply
Returns: state, reward, done, info
Return type: np.array, float, bool, dict

reset(self)

Reset environment

Returns: State after reset
Return type: np.array

kill_connection(self): Kill the connection

close(self)

Close Env

Returns: Closing confirmation
Return type: bool

render(self, mode: str = 'human') → None

Required by gym.Env but not implemented

Parameters: mode (str) – Rendering mode

class dacbench.envs.ToySGDEnv(config)

Bases: dacbench.AbstractEnv

Optimize toy functions with SGD + Momentum.

Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value

An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]

build_objective_function(self)

get_initial_position(self)

step(self, action: Union[float, Tuple[float, float]]) → Tuple[Dict[str, float], float, bool, Dict]

Take one step with SGD

Parameters

action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)

Returns

stateDict[str, float]
State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
done : bool
info : Dict

Return type

Tuple[Dict[str, float], float, bool, Dict]

reset(self)

Reset environment

Returns: Environment state
Return type: np.array

render(self, **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

close(self)

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

class dacbench.envs.GeometricEnv(config)

Bases: dacbench.AbstractEnv

Environment for tracing different curves that are orthogonal to each other Use product approach: f(t,x,y,z) = X(t,x) * Y(t,y) * Z(t,z) Normalize Function Value on a Scale between 0 and 1

min and max value for normalization over all timesteps

get_optimal_policy(self, instance: List = None, vector_action: bool = True) → List[numpy.array]

Calculates the optimal policy for an instance

Parameters

instance (List, optional) – instance with information about function config.
vector_action (bool, optional) – if True return multidim actions else return onedimensional action, by default True

Returns

List with entry for each timestep that holds all optimal values in an array or as int

Return type

List[np.array]

step(self, action: int)

Execute environment step

Parameters: action (int) – action to execute
Returns: state, reward, done, info
Return type: np.array, float, bool, dict

reset(self) → List[int]

Resets env

Returns: Environment state
Return type: numpy.array

get_default_reward(self, _) → float

Calculate euclidean distance between action vector and real position of Curve.

Parameters: _ (self) – ignore
Returns: Euclidean distance
Return type: float

get_default_state(self, _) → numpy.array

Gather state information.

Parameters: _ – ignore param
Returns: numpy array with state information
Return type: np.array

close(self) → bool

Close Env

Returns: Closing confirmation
Return type: bool

render(self, dimensions: List, absolute_path: str)

Multiplot for specific dimensions of benchmark with policy actions.

Parameters: dimensions (List) – List of dimensions that get plotted

render_3d_dimensions(self, dimensions: List, absolute_path: str)

Plot 2 Dimensions in 3D space

Parameters: dimensions (List) – List of dimensions that get plotted. Max 2

_pre_reward(self) → Tuple[numpy.ndarray, List]

Prepare actions and coordinates for reward calculation.

Returns: [description]
Return type: Tuple[np.ndarray, List]

dacbench.envs

Subpackages

Submodules

Package Contents

Classes

Functions

`dacbench.envs`