Several new innovations have been made possible thanks to advances in the field of artificial intelligence and Deep Learning. Complex tasks such as text or image summarization, segmentation and classification are successfully handled using neural networks. However, achieving adequate results during neural network training can take days or even weeks due to its computational requirements. Inference in pre-trained models is also sometimes slow, especially for complex designs.
Parallelization techniques accelerate training and inference in deep neural networks. Even though these methods are widely used, some operations in neural networks are still done sequentially. Diffusion models generate outputs through a succession of denoising steps, and forward and backward passes occur layer by layer. As the number of steps increases, executing these processes sequentially becomes computationally expensive, which can lead to a computational bottleneck.
To solve this problem, a team of Apple researchers introduced DeepPCR, a unique algorithm that seeks to accelerate the training and inference of neural networks. DeepPCR works by perceiving a series of L steps as the answer to a certain set of equations. The team used the parallel cyclic reduction (PCR) algorithm to recover this solution. Reducing the computational cost of sequential processes from O(L) to O(log2 L) is the main advantage of DeepPCR. Speed is increased due to this reduction in complexity, particularly for high values of L.
The team conducted experiments to verify theoretical claims about decreasing the complexity of DeepPCR and to determine speedup conditions. They achieved speedups of up to 30× for the forward pass and 200× for the backward pass by applying DeepPCR to parallelize the forward and backward pass in multilayer perceptrons.
The team also demonstrated the adaptability of DeepPCR by using it to train ResNets, which have 1,024 layers. Training can be completed up to 7 times faster using DeepPCR. The technique is used for the generation phase of diffusion models, producing generation 11 times faster than the sequential approach.
The team summarized its main contributions as follows.
- DeepPCR, which is an innovative approach to parallelize sequential processes in neural network training and inference, was introduced. Its main feature is its ability to reduce the computational complexity from O(L) to O(log2 L), where L is the length of the sequence.
- DeepPCR was used to parallelize forward and backward passes in multilayer perceptrons (MLPs). An in-depth technology performance analysis was also conducted to identify high performance regimes of the method while considering basic design parameters. The study also investigates tradeoffs between speed, solution accuracy, and memory usage.
- DeepPCR was used to accelerate deep ResNet training on MNIST and generation of diffusion models trained on MNIST, CIFAR-10, and CelebA datasets. The results showed that even though DeepPCR shows a significant speedup, recovering data improvement up to 7 times faster for ResNet training and 11 times faster for building diffusion models, it still produces results comparable to the techniques sequential.
Check Paper. All credit for this research goes to the researchers of this project. Also don’t forget to register our SubReddit 34k+ ML, 41,000+ Facebook communities, Discord Channel, And E-mailwhere we share the latest AI research news, interesting AI projects and much more.
Tanya Malhotra is a final year undergraduate from University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in Artificial Intelligence and Machine Learning.
She is passionate about data science, with good analytical and critical thinking, as well as a keen interest in learning new skills, leading groups and managing work in an organized manner.