‘Large-scale Deep Unsupervised Learning using Graphics Processors’ (2009) from Ranja, Madhavan and Andrew Ng is probably the first really important paper that introduced GPUs to large neural networks.
Shortly after (2010), Dan Ciresan (from Schmidhuber’s lab, with Schmidhuber being the last author in the paper) broke all records in MNIST using a large convolutional neural network trained on GPUs.
Alex Krizhevsky, Ilya Sutskever and Geoffrehy Hinton (2012) used a similar (but larger, and much better) network to classify ImageNet and they trained it on GPUs.