dacbench.envs.toysgd

Module Contents

Classes

ToySGDEnv

Optimize toy functions with SGD + Momentum.

Functions

create_polynomial_instance_set(out_fname: str, n_samples: int = 100, order: int = 2, low: float = -10, high: float = 10)

sample_coefficients(order: int = 2, low: float = -10, high: float = 10)

dacbench.envs.toysgd.create_polynomial_instance_set(out_fname: str, n_samples: int = 100, order: int = 2, low: float = -10, high: float = 10)
dacbench.envs.toysgd.sample_coefficients(order: int = 2, low: float = -10, high: float = 10)
class dacbench.envs.toysgd.ToySGDEnv(config)

Bases: dacbench.AbstractEnv

Optimize toy functions with SGD + Momentum.

Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value

An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]

build_objective_function(self)
get_initial_position(self)
step(self, action: Union[float, Tuple[float, float]]) Tuple[Dict[str, float], float, bool, Dict]

Take one step with SGD

Parameters

action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)

Returns

  • stateDict[str, float]

    State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”

  • reward : float

  • done : bool

  • info : Dict

Return type

Tuple[Dict[str, float], float, bool, Dict]

reset(self)

Reset environment

Returns

Environment state

Return type

np.array

render(self, **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:
Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

close(self)

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.