Model families ๐พ

Model architectures ๐

Metrics ๐

Solvers / Optimizers ๐งฎ

Training parameters

Augmentations

Deployment

Adam

A more efficient but slightly less generalised optimizer than SGD

Adam solvers are the hassle free standard for optimizers.

Hyper-parameter tuning usually yields 1-3% marginal gains in performance. Fixing your data is usually more effective.

Intuition

The intuition behind Adam solvers is similar to the one behind SGD. The main difference is though, that Adam solvers are adaptive notifiers. Adam also adjusts the learning rate based on the gradients' magnitude using **Root Mean Square Propagation (RMSProp)**. This follows a similar logic as using momentum + dampening for SGD. This makes it robust for the non-convex optimization landscape of neural network.

Code implementation

PyTorch

TensorFlow

1

import torch

2

โ

3

# N is batch size; D_in is input dimension;

4

# H is hidden dimension; D_out is output dimension.

5

N, D_in, H, D_out = 64, 1000, 100, 10

6

โ

7

# Create random Tensors to hold inputs and outputs.

8

x = torch.randn(N, D_in)

9

y = torch.randn(N, D_out)

10

โ

11

# Use the nn package to define our model and loss function.

12

model = torch.nn.Sequential(

13

torch.nn.Linear(D_in, H),

14

torch.nn.ReLU(),

15

torch.nn.Linear(H, D_out),

16

)

17

loss_fn = torch.nn.MSELoss(reduction='sum')

18

โ

19

# Use the optim package to define an Optimizer that will update the weights of

20

# the model for us. Here we will use Adam; the optim package contains many other

21

# optimization algorithms. The first argument to the Adam constructor tells the

22

# optimizer which Tensors it should update.

23

learning_rate = 1e-4

24

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

25

for t in range(500):

26

# Forward pass: compute predicted y by passing x to the model.

27

y_pred = model(x)

28

โ

29

# Compute and print loss.

30

loss = loss_fn(y_pred, y)

31

print(t, loss.item())

32

33

# Before the backward pass, use the optimizer object to zero all of the

34

# gradients for the Tensors it will update (which are the learnable weights

35

# of the model)

36

optimizer.zero_grad()

37

โ

38

# Backward pass: compute gradient of the loss with respect to model parameters

39

loss.backward()

40

โ

41

# Calling the step function on an Optimizer makes an update to its parameters

42

optimizer.step()

Copied!

1

# importing the library

2

import tensorflow as tf

3

โ

4

opt = tf.keras.optimizers.Adam(learning_rate=0.1)

5

var1 = tf.Variable(10.0)

6

loss = lambda: (var1 ** 2)/2.0 # d(loss)/d(var1) == var1

7

step_count = opt.minimize(loss, [var1]).numpy()

8

# The first step is `-learning_rate*sign(grad)`

9

var1.numpy()

Copied!

Further resources

โ

Last modified 1yr ago

Copy link