-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DL4J: Add Padam adaptive gradient updater #6253
Comments
For some extra reference: #5843 (comment) and some few comments downstream. |
I'd like to give this a shot! As it's my first time contributing to DL4J, do you have any advice/suggestions for me? |
Are we sure about creating new classes for Padam? |
We can extend AmsGrad, of course. That's what OOP is for :)
|
Yeah, I'm ok with a separate class (extending AMSGrad if that makes sense). |
Hi, I added the required classes. It is safe to merge: I did not add the predicate for the range of param, instead logged a warning. Will need your help with predicates. Haven't requested a pull as I haven't tested the code yet. Couldn't build the project (tried a lot of things) using IntelliJ on Macos. Can someone point me to a thorough readme/guide for the same? |
@achalagarwal It's name is "Preconditions" actually, just do something like this:
Use Maven on the command line with |
The build was successful but I had to skip a couple of projects due to network issues (HTTP requests failed) On Ubuntu: Now, how do you suggest I validate the correctness of Padam? Do you want me to build a model and replicate results from a publication? This will take a lot of time. Are there relevant tests for the linalg/learning modules? I could not find any. cc: @saudet |
@achalagarwal we have updater tests here, adding to that would be good: We'll carefully review the implementation too once you've opened a pull request. That should be good enough I think. |
The Padam updater was recently described here: https://arxiv.org/pdf/1806.06763.pdf
It is an extension of Adam/AMSGrad that claims improved performance (accuracy) like SGD while still maintaining high convergence rates of Adam/AMSGrad. Mathematically, it's basically a blending of SGD and AMSGrad.
Implementing this isn't a high priority for the core team. If anyone wants to tackle this, there are configuration and implementation classes here (we'll need one of each for Padam):
https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config
https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning
The text was updated successfully, but these errors were encountered: