For two weeks last July, I cocooned myself in a hotel in Portland, OR, living and breathing probabilistic programming as a “student” in the probabilistic programming summer school run by DARPA. The school is part of the broader DARPA program on Probabilistic Programming for Advanced Machine Learning (PPAML), which has resulted in a great infusion of energy (and funding) into the probabilistic programming space. Last year was the inaugural one for the summer school, one that is meant to introduce and disseminate the languages and tools being developed to the broader scientific and technology communities. The school was graciously hosted by Galois Inc., which did a terrific job of organizing the event. Thankfully, they’re hosting the summer school again this year (there’s still time to apply!), which made me think that now is a good time to reflect on last year’s program and provide a snapshot of the state of the field. I will also take some liberty in prognosticating on the future of this space. Note that I am by no means a probabilistic programming expert, merely a curious outsider with a problem or two to solve.
There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others). Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving. To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem. I will do this here for the case of multilayer perceptrons. By the end of this informal discussion I hope to provide an intuitive picture of the surprisingly simple representations that NNs encode.
It is tempting to assume that with the appropriate choice of weights for the edges connecting the second and third layers of the NN discussed in this post, it would be possible to create classifiers that output over any composite region defined by unions and intersections of the 7 regions shown below.