The ToySGD Benchmark

Task: control the learning rate and momentum of SGD in simple function approximation
Cost: log regret
Number of hyperparameters to control: two floats
State Information: remaining budget, gradient, current learning rate, current momentum
Noise Level: fairly small
Instance space: target function specification

This artificial benchmark uses functions like polynomials to test DAC controllers’ ability to control both learning rate and momentum of SGD. At each step until the cutoff, both hyperparameters are updated and one optimization step is taken. As we know the global optimum of the function, the cost is measured as the current regret.

By using function approximation, this benchmark is computationally cheap, so likely a good entry point before tackling the full-sizes SGD or CMA-ES step size benchmarks. It can also serve as a first test whether a DAC method can handle multiple hyperparameters at the same time.

class dacbench.benchmarks.toysgd_benchmark.ToySGDBenchmark(config_path=None, config=None)

Bases: AbstractBenchmark

get_environment()

Return SGDEnv env with current configuration

Returns

SGD environment

Return type

SGDEnv

read_instance_set(test=False)

Read path of instances from config into list

class dacbench.envs.toysgd.ToySGDEnv(config)

Bases: AbstractEnv

Optimize toy functions with SGD + Momentum.

Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value

An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]

close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(**kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:
Make sure that your class’s metadata ‘render.modes’ key includes

the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:

return np.array(…) # return RGB frame suitable for video

elif mode == ‘human’:

… # pop up a window and render

else:

super(MyEnv, self).render(mode=mode) # just raise an exception

reset()

Reset environment

Returns

Environment state

Return type

np.array

step(action: Union[float, Tuple[float, float]]) Tuple[Dict[str, float], float, bool, Dict]

Take one step with SGD

Parameters

action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)

Returns

  • stateDict[str, float]

    State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”

  • reward : float

  • done : bool

  • info : Dict

Return type

Tuple[Dict[str, float], float, bool, Dict]