Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
L-BFGS converges faster and with better solutions on small datasets. However, ADAM is very robust for relatively large datasets. It usually converges quickly and gives pretty good performance. SGD with momentum …