Aside

It is tempting to assume that with the appropriate choice of weights for the edges connecting the second and third layers of the NN discussed in this post, it would be possible to create classifiers that output 1 over any composite region defined by unions and intersections of the 7 regions shown below.

Figure 6

This is untrue, a fact that can be shown for the three edge case by brute force enumeration of all unique NNs of this architecture (I’m assuming fixed weights for edges connecting the 1st and 2nd layers). Because edge weights can vary continuously this may seem like an impossible task, but in reality we can restrict our attention to a small number of integer weights.

Consider the input to a_1^{(3)}, called the pre-activation function. By construction the value of this quantity cannot change within any of the 7 regions. For a given threshold value (set by the bias term b_1^{(2)}), all regions whose pre-activation values exceed that threshold will cause a_1^{(3)} to fire a 1, and all others will cause it to fire a 0. For fixed weights w_{11}^{(2)} thru w_{13}^{(2)}, changing the value of b_1^{(2)} can result in at most 8 distinct NNs, since there are 7 regions and hence 7 distinct thresholds. Furthermore, the actual values of the thresholds do not matter, only their relative rank order, which enables us to only consider integer values. By enumerating all permissible rank orderings and all distinct biases, we can characterize the full set of distinct NNs.

Below is such an enumeration of all distinct NNs with the architecture under consideration. Orange regions correspond to areas of \mathbb{R}^2 where the NN will output 1.

Figure 12

Below is the complement, the set of impossible NNs to achieve using this architecture.

Figure 13

It is easy to see why some configurations are impossible. Consider the fourth example above (counting from top-left). Let’s denote the regions covered by only one half-space by r_1, r_2, and r_3. All such regions are inactive, and all regions covered by exactly two half-spaces are active. This means that by adding the pre-activations of any two of \{r_1,r_2,r_3\}, the threshold is crossed, i.e. there was an increase in the value of the pre-activation function. But this implies that the pre-activation value associated with all three regions r_1, r_2, and r_3 simultaneously must be higher still, hence their intersection cannot be inactive.

Characterizing The Space of Distinct Neural Networks


2 comments

  1. Pingback: What Does a Neural Network Actually Do? « Some Thoughts on a Mysterious Universe

  2. Given the characterization above, the obvious question would be what would be needed to include all possible spaces (i.e including the complements not accessible by NN architecture)..?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s