In this section, we are building a digit classifier. I have taken the famous ML course from Andrew Ng on Coursera, so I thought, why not build a neural network model for classifying digits from scratch?
It took me days to build the feedforward function and the backpropagation function to compute the cost and gradient. Just when I thought everything’s done, and I plug my functions in the scipy.optimize.fmin_g, it doesn’t work as I expected. I think my cost function is fine because when I don’t plug in the cost gradient function in as the parameter
fmin_gc(), in which case the gradient is approximated numerically, it does work. I set the
maxiter = 100 in this notebook because otherwise, it takes too long in DQ env. As you can see the accuracy is pretty low.
If I plug in the cost gradient function in
fprime, it only iters once and it’s the cost function is not optimized. I would love to start a discussion with anyone who has used the
This project is not successful, but I still want to share it, and maybe get some feedback on how to fix it. All aside, I’ve learned a lot of quirky stuff about numpy with this project and obtained a much better understanding of the neural network.
It’s unfinished business, but for now, I just need to take a long break.
Building a neural network model from scratch.ipynb (73.1 KB)
Click here to view the jupyter notebook file in a new tab
I am currently taking a look, there is a problem with your Latex formatting in the “Randomly initialize the parameters for symmetry breaking” section.
I am not familiar the
scipy.optimize libray so I don’t believe I can help a lot. But maybe you are pretty close to the solution when you say you cannot plug the cost gradient function in
fmin_cg? Since it should iter more than once and you have identified the problem, sure you will find some help in StackOverflow or elsewhere from people who faced a similar issue.
I found this about
scipy.optimize.minimize which seems related to the topic: How to return cost, grad as tuple for scipy’s fmin_cg function
@WilfriedF Thank you so much for the StackOverflow link! I believe it could help. I’m gonna give it a try and will let you know if it works. I think the problem lies in the layering of my cost and gradient functions and how
fmin_cg computes under the hood. I was tempted to recreate the cost and gradient function separately using parts of the
And thanks for spotting the latex formatting. I will fix that.
Again, thank you so much for taking a look of this project.