Descent into madness

This post is about the basics of the "gradient descent" method for finding the minimum of a function. I started writing it mainly to review the optimization material of lectures by Sébastien Bubeck given in Seattle. All of the material can be found elsewhere (for example, Sébastien's book), but I can assure you that in