The ToySGD Benchmark

Task: control the learning rate and momentum of SGD in simple function approximation
Cost: log regret
Number of hyperparameters to control: two floats
State Information: remaining budget, gradient, current learning rate, current momentum
Noise Level: fairly small
Instance space: target function specification

This artificial benchmark uses functions like polynomials to test DAC controllers’ ability to control both learning rate and momentum of SGD. At each step until the cutoff, both hyperparameters are updated and one optimization step is taken. As we know the global optimum of the function, the cost is measured as the current regret.

By using function approximation, this benchmark is computationally cheap, so likely a good entry point before tackling the full-sizes SGD or CMA-ES step size benchmarks. It can also serve as a first test whether a DAC method can handle multiple hyperparameters at the same time.

class dacbench.benchmarks.toysgd_benchmark.ToySGDBenchmark(config_path=None, config=None)

Bases: AbstractBenchmark

get_environment()

Return SGDEnv env with current configuration

Returns: SGD environment
Return type: SGDEnv

read_instance_set(test=False): Read path of instances from config into list

class dacbench.envs.toysgd.ToySGDEnv(config)

Bases: AbstractEnv

Optimize toy functions with SGD + Momentum.

Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value

An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]

close()

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(**kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()

Reset environment

Returns: Environment state
Return type: np.array

step(action: Union[float, Tuple[float, float]]) → Tuple[Dict[str, float], float, bool, Dict]

Take one step with SGD

Parameters

action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)

Returns

stateDict[str, float]
State with entries “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
done : bool
info : Dict

Return type

Tuple[Dict[str, float], float, bool, Dict]