In the AI sector, tech giants are competing to create larger and larger language models. However, a new trend has emerged: small language models (SLM) are gaining popularity. While progress in large language models (LLMs) appears to be stagnating, researchers and developers are focusing on SLMs. These AI models are compact, efficient and highly adaptable, challenging the idea that bigger is always better. SLMs are expected to revolutionize AI development.
Is demand for LLM degrees reaching a plateau?
According to recent performance comparisons carried out by Vellum And HuggingFace, the gap between LLMs is rapidly narrowing. This trend is particularly visible in certain areas, such as multiple choice questions, reasoning, and math problems, where the differences in performance between the top models are minimal. For example, in multiple choice questions, Claude 3 Opus, GPT-4 and Gemini Ultra all score above 83%, while in reasoning tasks, Claude 3 Opus, GPT-4 and Gemini 1.5 Pro exceed 92 % accuracy.
Interestingly, even smaller models like Mixtral 8x7B and Llama 2 – 70B show promising results in specific areas, such as reasoning and multiple choice questions, where they outperform some of their larger counterparts. This suggests that model size may not be the only determinant of performance and that other aspects, such as architecture, training data and fine-tuning techniques, could play an important role. .
The latest research papers announcing new LLMs are all pointing in the same direction: “If you just look empirically, the last dozen papers that are coming out, they're sort of all in the same general territory as GPT-4” , says Gary. Marcus, former director of Uber AI and author of “Rebooting AI,” a book about creating trustworthy AI.
“According to Marcus, although some language models are slightly better than GPT-4, there has been no significant improvement in over a year. This raises the question of whether large language models are meeting their performance limitations If this trend continues, it could have a significant impact on the future development and use of language models. Developers may need to focus on developing more efficient and specialized architectures. , rather than just increasing the size of the models.
The LLM approach has certain limitations that must be taken into account
Language models with large amounts of parameters (LLM) are undeniably powerful. However, their use has significant drawbacks. LLMs require huge data for training, which requires billions or even billions of parameters. As a result, the training process is extremely resource intensive, and the computing power and energy consumption required to train and run LLMs is staggering. This results in high costs, making it difficult for small organizations or individuals to engage in basic LLM development. At an event at MIT last year, Sam Altman, CEO of OpenAI, revealed that the cost of GPT-4 training was at least $100 million.
Additionally, the complexity of the tools and techniques required to work with LLMs presents a steep learning curve for developers, further limiting accessibility. The development cycle for machine learning models is also quite long, from training to model creation and deployment, which slows down development and experimentation. A recent paper from the University of Cambridge shows that businesses can spend 90 days or more to deploy a single machine learning model.
LLMs, or linguistic models, have a significant problem when it comes to generating results that seem plausible but are not factual. This is because they are trained to predict the most likely next word based on patterns present in the training data rather than having a true understanding of the information. As a result, LLMs may confidently produce false statements, invent facts, or combine unrelated concepts in absurd ways. Detecting and mitigating these “hallucinations” constitutes an ongoing challenge in the development of reliable and trustworthy linguistic models.
According to Marcus, if you're using LLMs for high-stakes issues, you don't want to insult your clients, get bad medical information, or use them to drive a car and take risks. So this remains a problem.
The scale of LLMs and their black-box nature can also make them difficult to interpret and debug, which is crucial for building confidence in model results. Biases in training data and algorithms can lead to unfair, inaccurate, or even harmful results. Techniques for making LLMs “safe” and reliable can also reduce their effectiveness, as seen with Google Gemini. Additionally, the centralized nature of LLMs raises concerns about the concentration of power and control in the hands of a few large technology companies.
Small Language Models (SLM)
Small language models (SLM) are simplified versions of large language models (LLM) with fewer parameters and simpler designs. They require less data and training time, taking only a few minutes or hours, compared to several days for LLMs. This makes SLMs more efficient and easier to implement on-premises or on smaller devices.
One of the main advantages of SLMs is their suitability for specific applications. Because of their targeted scope and need for less data, they can be adjusted more easily for particular areas or tasks than large, general-purpose models. This customization allows businesses to create highly effective SLMs for their specific needs, such as sentiment analysis, named entity recognition, or answering domain-specific questions. The specialized nature of SLMs can lead to improved performance and efficiency of these targeted applications compared to using a more general model.
SLMs, or “smaller language models,” have several advantages over larger models in terms of privacy and security. Due to their smaller code base and simpler architecture, SLMs are easier to audit and less likely to have unintended vulnerabilities. This makes it a better choice for applications that handle sensitive data, such as healthcare or finance. Data breaches could have serious consequences in these sectors, so it is essential to use models that minimize the risk of exposure.
Another advantage of SLMs is their reduced computational requirements. This means they can run on on-premises devices or servers, reducing the need for cloud infrastructure. Local processing can further improve data security and reduce the risk of exposure during data transfer.
Additionally, SLMs are less prone to undetected hallucinations in their specific domain than LLMs. SLMs are typically trained on a smaller, more focused dataset specific to their intended domain or application. This helps the model learn the patterns, vocabulary, and information most relevant to its task. With fewer parameters and a more streamlined architecture, SLMs are also less likely to capture and amplify noise or errors in training data.
According to Clem Delangue, CEO of AI startup HuggingFace, up to 99% of use cases could be solved using SLM. HuggingFace, whose platform allows developers to build, train and deploy machine learning models, recently announced a strategic partnership with Google. As a result, HuggingFace was integrated into Google Vertex AIallowing developers to quickly deploy thousands of models via Google Vertex Model Garden.
“Gemma” by Google
Google initially lost to OpenAI in the development of large language models (LLM), but is now actively pursuing the small language model (SLM) opportunity. In February, Google launched Gemma, a series of SLMs which are efficient and user-friendly and can run on everyday devices such as smartphones, tablets and laptops without requiring special hardware or extensive optimization.
Since its release, Gemma has been downloaded over 400,000 times on HuggingFace, and some exciting projects are already coming to fruition. For example, Cerule is a powerful image and language model that combines Gemma 2B with Google's SigLIP. It was trained on a large dataset of images and text and uses highly efficient data selection techniques to achieve high performance without requiring a lot of data or calculations. This makes it well suited to emerging edge computing use cases.
Another example is CodeGemma, a specialized version of Gemma that focuses on coding and mathematical reasoning. CodeGemma offers three different templates suitable for various coding-related activities, making advanced coding tools more accessible and efficient for developers.
The transformative potential of small language models
As the AI community continues to explore the potential of small language models (SLM), the benefits of faster development cycles, improved efficiency, and the ability to customize models based on specific needs become more and more obvious. SLMs have the potential to make AI accessible to more people and drive innovation across industries by delivering cost-effective and targeted solutions. Deploying SLM at the edge opens new possibilities for real-time, personalized and secure applications in various industries such as finance, entertainment, automotive systems, education, e-commerce and healthcare.
By processing data locally and reducing reliance on cloud infrastructure, edge computing with SLM enables faster response times, better data privacy, and an improved user experience. This decentralized approach to AI has the potential to transform the way businesses and consumers interact with technology, creating more personalized and intuitive experiences in the real world. As larger language models (LLMs) face computing resource challenges and potentially reach performance plateaus, the rise of SLMs promises to keep the AI ecosystem evolving at an impressive pace .