Launched in 2021, Amazon SageMaker Canvas is a point-and-click visual service for building and deploying machine learning (ML) models without the need to write code. Ready-to-use base models (FMs) available in SageMaker Canvas enable customers to use generative AI for tasks such as content generation and summarization.
We are excited to announce the latest updates to Amazon SageMaker Canvas, which bring exciting new generative AI capabilities to the platform. With support for Meta Llama 2 and Mistral.AI models and the launch of streaming responses, SageMaker Canvas continues to empower anyone who wants to get started with generative AI without writing a single line of code. In this article, we discuss these updates and their benefits.
Presentation of the Meta Llama 2 and Mistral models
Llama 2 is a cutting-edge core model from Meta that provides enhanced scalability and versatility for a wide range of generative AI tasks. Users have reported that Llama 2 is able to engage in meaningful and coherent conversations, generate new content, and extract responses from existing notes. Llama 2 is one of the cutting-edge Large Language Models (LLM) available today.
Mistral.AI, a leading French AI start-up, has developed Mistral 7B, a powerful language model with 7.3 billion parameters. Mistral models have been very well received by the open source community due to the use of Clustered Query Attention (GQA) for faster inference, making them very efficient and comparable to a model with two or three times more parameters.
Today, we are excited to announce that SageMaker Canvas now supports three Llama 2 template variants and two Mistral 7B variants:
To test these models, go to the SageMaker Canvas Ready-to-use templates page, then choose Generate, extract and summarize content. This is where you’ll find the SageMaker Canvas GenAI chat experience. Here you can use any template from Amazon Bedrock or SageMaker JumpStart by selecting them from the template drop-down menu.
In our case we choose one of the Llama 2 templates. Now you can provide your input or query. When you submit the input, SageMaker Canvas passes your input to the model.
Choosing which of the models available in SageMaker Canvas best suits your use case requires that you consider information about the models themselves: The Llama-2-70B-chat model is a larger model (70 billion parameters, compared to 13 billion with Llama-2-13B-chat), which means its performance is generally higher than the smaller one, at the cost of slightly higher latency and increased cost per token. Mistral-7B has comparable performance to Llama-2-7B or Llama-2-13B, but it is hosted on Amazon SageMaker. This means that the pricing model is different, moving from a dollars per token pricing model to a dollars per hour model. This can be more cost-effective with a significant number of requests per hour and consistent usage at scale. All of the models above can work well in a variety of use cases. Our suggestion is therefore to evaluate which model best solves your problem, taking into account the trade-offs in yield, throughput and cost.
If you’re looking for an easy way to compare model behavior, SageMaker Canvas natively provides this functionality in the form of model comparisons. You can select up to three different templates and send the same request to them all at once. SageMaker Canvas will then obtain the responses from each of the models and display them in a side-by-side chat user interface. To do this, choose Compare and choose other models to compare, as shown below:
Introducing Response Streaming: Real-Time Interactions and Improved Performance
One of the main advancements in this release is the introduction of streaming responses. Streaming responses provides a richer user experience and better reflects a chat experience. With streaming responses, users can receive instant feedback and seamless integration into their chatbot applications. This allows for a more interactive and responsive experience, improving the chatbot’s overall performance and user satisfaction. The ability to receive immediate responses in the form of chat creates a more natural conversation flow and improves user experience.
With this feature, you can now interact with your AI models in real-time, receive instant responses, and enable seamless integration into a variety of applications and workflows. All queryable models in SageMaker Canvas (from Amazon Bedrock and SageMaker JumpStart) can stream responses to the user.
Whether you’re building a chatbot, recommendation system, or virtual assistant, the Llama 2 and Mistral models combined with streaming responses bring improved performance and interactivity to your projects.
To use the latest features of SageMaker Canvas, be sure to delete and recreate the application. To do this, log out of the application by choosing Sign out, then open SageMaker Canvas again. You should see the new models and enjoy the latest versions. Disconnecting the SageMaker Canvas application will free up all resources used by the workspace instance, avoiding incurring unintended additional charges.
To get started with the new streaming responses for Llama 2 and Mistral models in SageMaker Canvas, visit SageMaker Console and explore the intuitive interface. To learn more about how SageMaker Canvas and generative AI can help you achieve your business goals, see Empower your business users to extract insights from business documents using Amazon SageMaker Canvas and Generative AI. And Overcome common contact center challenges with generative AI and Amazon SageMaker Canvas.
If you want to learn more about SageMaker Canvas features and dig deeper into other ML use cases, check out the other articles available in the SageMaker Canvas Category from the AWS ML blog. We can’t wait to see the amazing AI applications you create with these new features!
About the authors
Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with clients around the world looking to adopt Low-Code/No-Code machine learning technologies and generative AI. He has been a developer since he was very young, starting coding at the age of 7. He started learning AI/ML in college and has since fallen in love with it.
Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Prior to AWS, Dan created and marketed enterprise SaaS platforms and time series models used by institutional investors to manage risk and construct optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.