Deep neural networks (DNNs) come in different sizes and structures. The specific architecture chosen along with the dataset and learning algorithm used is known to influence the neural patterns learned. Currently, a major challenge in deep learning theory is the issue of scalability. Although exact solutions for learning dynamics exist for simpler networks, tuning even a small part of the network architecture often requires significant changes to the analysis. Moreover, the latest models are so complex that they surpass practical analytical solutions. These results consist of complex machine learning models and even the brain, which poses challenges for theoretical studies.

In this paper, the first related work is Exact Solutions in Simple Architectures, where many advances are made in the theoretical analysis of deep linear neural networks, e.g. the loss landscape is well understood and exact solutions have been obtained for specific initial conditions. . The next related approach is the neural tangent kernel, where a notable exception in terms of universal solutions is that it provides exact solutions applicable to a wide range of models. Next is the implicit biases in the gradient descent technique, where the investigation of gradient descent is done as a source of generalization performance in DNN. The last method is Local Elasticity, where a model shows this property if updating one feature vector minimally affects different feature vectors.

Researchers from University College London have proposed a method for modeling universal representation learning, which aims to explain common phenomena observed in learning systems. An efficient theory has been developed for two similar data points to interact with each other during training when the neural network is large and complex, so it is not very limited by its parameters. Furthermore, the existence of universal behavior in representational learning dynamics is demonstrated, by the derived theory explaining the dynamics of different deep networks with different activation functions and architectures.

The proposed theory looks at the representational dynamics at “some intermediate layer H.” Since DNNs have many layers where representations can be observed, it begs the question of how these dynamics depend on the depth of the chosen intermediate layer. To answer this, it is necessary to determine at which layers the effective theory is still valid. For the linear approximation to be correct, the representations must start close to each other. If the initial weights are small, each layer’s average activation gain factor is a constant G, which is less than 1. The initial representation distance is shown as a function of the depth n scales:

This function decreases, so the theory should be more accurate in the later layers of the network.

The effective learning rates are expected to vary in different hidden layers. In standard gradient descent, updating means adding the parameters, so changes are proportional to the number of parameters. In the deeper hidden layers, the number of parameters in the encoder map increases, while the number in the decoder map decreases. This causes the effective learning rate for the encoder to increase with depth and for the decoder to decrease with depth. This relationship holds for the deeper layers of the network where the theory is correct, but in the earlier layers the effective learning rate of the decoder appears to increase.

In summary, University College London researchers have introduced a new theory of how neural networks learn, focusing on their common learning patterns across different architectures. It shows that these networks naturally learn structured representations, especially when starting with small weights. Instead of presenting this theory as a definitive universal model, researchers emphasized that gradient descent, the basic method used to train neural networks, can support aspects of representation learning. However, this approach faces challenges when applied to larger datasets, and further research is necessary to deal with these complexities effectively and handle more complex data.

Check out **Paper.** All credit for this research goes to the researchers in this project. Also, don’t forget to follow us **Twitter**.

Join us this year **Telegram channel** and **LinkedIn Gr****up**.

**If you like our work, you’ll love ours**** newsletter..**

Don’t forget to join us this spring **46k+ ML SubReddit**

Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a technology enthusiast, he immerses himself in the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He strives to articulate complex AI concepts in a clear and accessible way.