Meta’s Llama 3.1 405B for Enterprise Apps in Snowflake Cortex AI

Today, Snowflake is excited to announce that the Llama 3.1 collection of multilingual large language models (LLMs) are now available in Snowflake Cortex AI, providing enterprises with secure, serverless access to Meta’s most advanced open source model. Snowflake offers the largest context window of any vendor, at 128k, for the Llama 3.1 collection of models. 

Using Cortex AI, Snowflake customers can securely use their data with Meta’s newest open source models for AI app development. Llama 3.1 405B, along with 8B and 70B models, are now available for serverless inference, with fine-tuning for the entire Llama 3.1 collection coming soon. Additionally, the Llama 3 8B and 70B models will continue to be available for both serverless inference and fine-tuning (public preview). 

Built with top-quality trust and safety features for reliable, secure use, the newest Llama 3.1 collection of multilingual LLMs makes cutting-edge technology easily accessible, ready to serve a vibrant community of generative AI users with broad platform support.

Llama 3.1 405B at a Glance

Llama is an accessible, open LLM designed for developers, researchers and businesses to build, experiment and responsibly scale their generative AI ideas. Llama 3.1 405B is the first open source model that performs on par with leading proprietary AI models in general knowledge, steerability, math, tool use and multilingual translation, among other capabilities. 

Based on its enhanced abilities, Llama 3.1 405B should be evaluated for use cases such as:

  • Long-document processing: With the larger context window of 128k tokens, customers can summarize, analyze and run other natural language processing tasks without having to break up or chunk long documents. 
  • Advanced multilingual apps: Models are optimized for dialogue across 10 languages, providing a robust solution for apps that use several different languages, including Spanish, Portuguese, Italian and German.
  • Synthetic data generation: Customers can generate synthetic data and apply it in post-training to improve smaller Llama models.
  • Distillation recipe: Customers can use distillation to create small, efficient models from large ones like Llama 3.1 405B. Access to scripts and notebooks, along with detailed methods and examples, allows for self-distillation. These smaller models offer comparable performance at lower costs and reduced latency, ideal for resource-constrained environments.

The new collection of Llama 3.1 models is truly community-focused, with a broad reach across the ecosystem. Llama models have been downloaded hundreds of millions of times and are supported by thousands of community projects. From cloud providers to startups, the world is building with Llama, solving complex enterprise use cases globally.

Minimizing risk with robust safety features

As Snowflake continues to provide AI tools to its enterprise customers, we have learned that making generative AI easy to use is pointless if it isn’t safe. As generative AI applications move into production and user bases expand, the risk of harmful interactions increases. Implementing robust safety features without compromising scalability or increasing costs is essential. We are excited to share that the Llama 3.1 model collection is dedicated to trust and safety, releasing models and tools that promote community collaboration and standardize safety practices in generative AI.

“Harnessing generative AI depends on safety and trust, and the Snowflake platform provides us with the necessary assurances to innovate and leverage industry-leading large language models at scale,” said Ryan Klapper, an AI leader at Hakkoda. “Meta’s Llama models within Snowflake Cortex AI are a powerful combination that will unlock even more opportunities for us to service internal RAG-based applications. Through these applications, stakeholders can more seamlessly interact with internal knowledge centers, ensuring that they have access to accurate and relevant information whenever needed.”

For more details on model benchmarks and other use cases, visit the Llama 3.1 405B announcement from Meta

Snowflake open sources its optimizations for fine-tuning and long context inference

To deliver efficient and cost-effective inference and fine-tuning of 100+ billion-parameter models, the Snowflake AI Research team has been working on a stack for massive inference and fine-tuning used in Cortex AI. Today, with the launch of Llama 3.1 405B from Meta, this effort has become more relevant than ever. 

Meta’s OSS release of the Llama 3.1 collection inspired our Snowflake AI Research team to open source more of our innovations. To help the AI community leverage and build on top of this powerful but massive model, we are open sourcing the Massive LLM Inference System Optimization and the Massive LLM Fine-Tuning System Optimization stacks.

You can learn more about these in Snowflake’s engineering blog on Inference System Optimization and Fine-tuning System Optimization.

Build using Llama 3.1 405B: With Cortex AI you can run inference on your data in tables using familiar SQL syntax. Upload a sample table (e.g., customer survey results) with the relevant columns, such as Customer ID, Customer Name and Customer Feedback. You can use the following SQL function to create a summary column of the Customer Feedback.

Availability: The Llama 3.1 405B model is available in AWS us-west-2, and it will be accessible to more customers in the coming weeks. You can access the availability matrix for future updates. For more information on Cortex AI, including available models and usage instructions, please refer to Snowflake’s documentation center. 

We are excited to see how you use Llama 3.1 405B to build AI applications for your specific use cases. Visit the Snowflake Cortex AI homepage to learn more and try it for yourself with our quickstart guide