In a groundbreaking development, researchers at Sungkyunkwan University in South Korea have unveiled a novel memory system inspired by the principles of Hebbian theory. This cutting-edge innovation promises to significantly enhance the performance of transformer-based machine learning models, particularly in tasks involving the processing of long data sequences. The research, published on the arXiv preprint server, introduces “Memoria,” a general memory network that leverages Hebbian theory to improve long-term dependencies in neural networks.
While transformers have proven to be formidable tools in processing sequential data, they have limitations when it comes to handling longer sequences. Transformers’ restricted storage capacity impedes their ability to effectively process extended input sequences, leading to a decline in performance. Unlike humans, who selectively remember and utilize relevant information from inputs, transformers process all raw data from start to end, making them less efficient in handling lengthy sequences.
A Hebbian memory for transformers
Researchers Sangjun Park and JinYeong Bak’s primary objective was to design a system that could overcome these limitations and empower transformer models with improved capabilities. Drawing inspiration from Hebbian theory, which posits that neurons and cells that are repeatedly activated together tend to form associations that lead to learning, they introduced Memoria.
“Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb’s rule,”
explain Park and Bak in their paper.
In a series of experiments, the researchers evaluated the effectiveness of their Hebbian memory system, and the results were highly promising. Memoria demonstrated its ability to significantly enhance the performance of transformer models across a range of tasks that involve processing lengthy data sequences.
“Through experiments with popular transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks,”
the researchers reported.
“Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.”
Open-Source code for wider adoption
In a move aimed at fostering wider adoption and collaboration, the code developed by Park and Bak is open-source and readily accessible on GitHub. The researchers also deployed Memoria using an independent Python package, making it user-friendly for developers around the world. This open-access approach is expected to encourage other research groups to explore the potential of Memoria in enhancing the performance of their transformer-based models.
The groundbreaking memory architecture developed by Sungkyunkwan University researchers is poised for further exploration in complex tasks, and its potential applications are vast. As more research groups worldwide experiment with Memoria, we can expect a surge in the development of more powerful and efficient transformer-based machine learning models.