![]() ![]() Here are some tips, tricks, and issues to watch out for: In practice, the process is much more involved and error prone. In theory, performing a gradient check is as simple as comparing the analytic gradient to the numerical gradient. This section is devoted to the dynamics, or in other words, the process of learning the parameters and finding good hyperparameters. In the previous sections we’ve discussed the static parts of a Neural Networks: how we can set up the network connectivity, the data, and the loss function. Per-parameter adaptive learning rates (Adagrad, RMSProp).First-order (SGD), momentum, Nesterov momentum.Activation/Gradient distributions per layer.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |