dacbench.envs
Subpackages
Submodules
Package Contents
Classes
Environment to learn Luby Sequence |
|
Environment for tracing sigmoid curves |
|
Environment to control Solver Heuristics of FastDownward |
|
Optimize toy functions with SGD + Momentum. |
|
Environment for tracing different curves that are orthogonal to each other |
Functions
|
Generator for the Luby Sequence |
- class dacbench.envs.LubyEnv(config)
Bases:
dacbench.AbstractEnvEnvironment to learn Luby Sequence
- step(self, action: int)
Execute environment step
- Parameters
action (int) – action to execute
- Returns
state, reward, done, info
- Return type
np.array, float, bool, dict
- reset(self) List[int]
Resets env
- Returns
Environment state
- Return type
numpy.array
- get_default_reward(self, _)
- get_default_state(self, _)
- close(self) bool
Close Env
- Returns
Closing confirmation
- Return type
bool
- render(self, mode: str = 'human') None
Render env in human mode
- Parameters
mode (str) – Execution mode
- dacbench.envs.luby_gen(i)
Generator for the Luby Sequence
- class dacbench.envs.SigmoidEnv(config)
Bases:
dacbench.AbstractEnvEnvironment for tracing sigmoid curves
- _sig(self, x, scaling, inflection)
Simple sigmoid function
- step(self, action: int)
Execute environment step
- Parameters
action (int) – action to execute
- Returns
state, reward, done, info
- Return type
np.array, float, bool, dict
- reset(self) List[int]
Resets env
- Returns
Environment state
- Return type
numpy.array
- get_default_reward(self, _)
- get_default_state(self, _)
- close(self) bool
Close Env
- Returns
Closing confirmation
- Return type
bool
- render(self, mode: str) None
Render env in human mode
- Parameters
mode (str) – Execution mode
- class dacbench.envs.FastDownwardEnv(config)
Bases:
dacbench.AbstractEnvEnvironment to control Solver Heuristics of FastDownward
- property port(self)
- property argstring(self)
- static _save_div(a, b)
Helper method for safe division
- Parameters
a (list or np.array) – values to be divided
b (list or np.array) – values to divide by
- Returns
Division result
- Return type
np.array
- send_msg(self, msg: bytes)
Send message and prepend the message size
Based on comment from SO see [1] [1] https://stackoverflow.com/a/17668009
- Parameters
msg (bytes) – The message as byte
- recv_msg(self)
Recieve a whole message. The message has to be prepended with its total size Based on comment from SO see [1]
- Returns
The message as byte
- Return type
bytes
- recvall(self, n: int)
Given we know the size we want to recieve, we can recieve that amount of bytes. Based on comment from SO see [1]
- Parameters
n (int) – Number of bytes to expect in the data
- Returns
The message as byte
- Return type
bytes
- _process_data(self)
Split received json into state reward and done
- Returns
state, reward, done
- Return type
np.array, float, bool
- step(self, action: Union[int, List[int]])
Environment step
- Parameters
action (Union[int, List[int]]) – Parameter(s) to apply
- Returns
state, reward, done, info
- Return type
np.array, float, bool, dict
- reset(self)
Reset environment
- Returns
State after reset
- Return type
np.array
- kill_connection(self)
Kill the connection
- close(self)
Close Env
- Returns
Closing confirmation
- Return type
bool
- render(self, mode: str = 'human') None
Required by gym.Env but not implemented
- Parameters
mode (str) – Rendering mode
- class dacbench.envs.ToySGDEnv(config)
Bases:
dacbench.AbstractEnvOptimize toy functions with SGD + Momentum.
Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value
An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]
- build_objective_function(self)
- get_initial_position(self)
- step(self, action: Union[float, Tuple[float, float]]) Tuple[Dict[str, float], float, bool, Dict]
Take one step with SGD
- Parameters
action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)
- Returns
- stateDict[str, float]
State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
done : bool
info : Dict
- Return type
Tuple[Dict[str, float], float, bool, Dict]
- reset(self)
Reset environment
- Returns
Environment state
- Return type
np.array
- render(self, **kwargs)
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- Note:
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Args:
mode (str): the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
- close(self)
Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
- class dacbench.envs.GeometricEnv(config)
Bases:
dacbench.AbstractEnvEnvironment for tracing different curves that are orthogonal to each other Use product approach: f(t,x,y,z) = X(t,x) * Y(t,y) * Z(t,z) Normalize Function Value on a Scale between 0 and 1
min and max value for normalization over all timesteps
- get_optimal_policy(self, instance: List = None, vector_action: bool = True) List[numpy.array]
Calculates the optimal policy for an instance
- Parameters
instance (List, optional) – instance with information about function config.
vector_action (bool, optional) – if True return multidim actions else return onedimensional action, by default True
- Returns
List with entry for each timestep that holds all optimal values in an array or as int
- Return type
List[np.array]
- step(self, action: int)
Execute environment step
- Parameters
action (int) – action to execute
- Returns
state, reward, done, info
- Return type
np.array, float, bool, dict
- reset(self) List[int]
Resets env
- Returns
Environment state
- Return type
numpy.array
- get_default_reward(self, _) float
Calculate euclidean distance between action vector and real position of Curve.
- Parameters
_ (self) – ignore
- Returns
Euclidean distance
- Return type
float
- get_default_state(self, _) numpy.array
Gather state information.
- Parameters
_ – ignore param
- Returns
numpy array with state information
- Return type
np.array
- close(self) bool
Close Env
- Returns
Closing confirmation
- Return type
bool
- render(self, dimensions: List, absolute_path: str)
Multiplot for specific dimensions of benchmark with policy actions.
- Parameters
dimensions (List) – List of dimensions that get plotted
- render_3d_dimensions(self, dimensions: List, absolute_path: str)
Plot 2 Dimensions in 3D space
- Parameters
dimensions (List) – List of dimensions that get plotted. Max 2
- _pre_reward(self) Tuple[numpy.ndarray, List]
Prepare actions and coordinates for reward calculation.
- Returns
[description]
- Return type
Tuple[np.ndarray, List]