Schmidhuber – A Name Lost in Alleys of AI Research

Schmidhuber – A Name Lost in Alleys of AI Research

Prof. Jürgen Schmidhuber was always early. Schmidhuber’s contributions to research around neural networks extend far beyond his most notable, like the Long Short Term Memory networks (LSTMs). These eventually went on to form the groundwork for progress in vision and speech models – around 2015 Google discovered that LSTM could reduce transcription errors in their speech recognition system by around 49%, making a huge leap after years of slow movement. 

With names like Yann LeCun assuming stardom due to the predominance of deep learning, Schmidhuber’s pioneering work has often been overlooked. And he has made his discontent known. In 1990, Schmidhuber’s research around gradient-based artificial neural networks for long-term planning and reinforcement learning through artificial curiosity introduced a bunch of new concepts like two of the most powerful Recurrent Neural Networks or RNNs called the controller and the world model. 

This year after LeCun published ‘A Path Towards Autonomous Machine Intelligence’, Schmidhuber accused him of rehashing older ideas and presenting them without any credit. He introduced artificial curiosity, which Schmidhuber says LeCun focused on in his abstract. He also claims that LeCun’s work on generative and adversarial neural networks in 2014 was a simplified version of his own work way back in 1990. 

A bunch of other now-prominent ideas that Schmidhuber wrote about in the 90s have found their way back in the current landscape and it is easy to understand why he would want recognition. 

In 1992, Schmidhuber published a paper titled ‘Learning to control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks’ that discussed an alternative to RNNs. Feedforward neural networks, which were the first and the simplest of all neural networks, slowly learned using gradient descent to program the changes of the fast weights of another neural network. 

This is how fast weights worked as a concept: Every neuron connection had a weight associated with it. This weight has two components to it – the standard slow weight that represents long-term memory (it learns slowly and perishes slowly) and the fast weight which represents short-term memory (it learns quickly and perishes quickly). 

Fast weights stored short-term memory with weights matrix, achieving a higher capacity. The short-term memory can also store relevant information from the history of the current sequence, so that the information for the ongoing process is readily available. 

In 1991, one of these FWPs computed weight changes using additive outer products of self-activation patterns. These self-activation patterns made a turn around and have come to be known as the keys and values for self-attention which are the central idea to Transformers today. The core idea of linearised self-attention Transformers has been borrowed from Schmidhuber’s paper. The impact of his work can be gauged from the impact that Transformer architectures now have across natural language processing computer vision applications. 

In 1993, Schmidhuber also introduced the term ‘attention’ in his paper, ‘Reducing the Ratio between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets’, which has now gained acceptance. 

Fast Weight Programmers slowly creeped back into the realm after big names like Geoffrey Hinton reignited the conversation around them. In 2017, Hinton in his lecture at the University of Toronto spoke about fast weights as an idea and their realisation again. “Fast weights provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights, we can avoid the need to store copies of neural activity patterns,” he said. 

Schmidhuber’s paper is now celebrating its 30-year anniversary, forcing us to reckon with the origins of some of the concepts that are behind the massive headway that AI research is currently making. 

Researchers have reasoned that Schmidhuber’s name was lost in the big gap due to the ebbs and flows in AI research while others say that there is a long list of contributors who have helped research reach the point where it is today. 

For a profile with The New York Times, Gary Bradski, chief scientific officer at OpenCV.aispoke about Schmidhuber saying, “He’s done a lot of seminal stuff. But he wasn’t the one who made it popular. It’s kind of like the Vikings discovering America; Columbus made it real.”

But as Schmidhuber’s older ideas see a resurgence, there is growing outrage within the community for ignoring his past accomplishments. His fight for recognition isn’t only for him, he insists, Schmidhuber believes researchers going all the way back to the 1960s have been regularly erased by the more contemporary giants. 

Images Powered by Shutterstock