vLLM

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM | Amazon Web Services

At Amazon, our team builds Rufus, a generative AI-powered shopping assistant that serves millions of customers at immense scale. However, deploying Rufus at scale introduces significant challenges that must be carefully navigated. Rufus is powered by a custom-built large language model (LLM). As the model’s complexity increased, we prioritized developingContinue Reading

Boost cold-start recommendations with vLLM on AWS Trainium | Amazon Web Services

In: Artificial Intelligence

Cold start in recommendation systems goes beyond just new user or new item problems—it’s the complete absence of personalized signals at launch. When someone first arrives, or when fresh content appears, there’s no behavioral history to tell the engine what they care about, so everyone ends up in broad genericContinue Reading

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM | Amazon Web Services

In: Artificial Intelligence

With the rise of large language models (LLMs) like Meta Llama 3.1, there is an increasing need for scalable, reliable, and cost-effective solutions to deploy and serve these models. AWS Trainium and AWS Inferentia based instances, combined with Amazon Elastic Kubernetes Service (Amazon EKS), provide a performant and low costContinue Reading

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips | Amazon Web Services

In: Artificial Intelligence

The use of large language models (LLMs) and generative AI has exploded over the last year. With the release of powerful publicly available foundation models, tools for training, fine tuning and hosting your own LLM have also become democratized. Using vLLM on AWS Trainium and Inferentia makes it possible toContinue Reading

vLLM

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM | Amazon Web Services

Boost cold-start recommendations with vLLM on AWS Trainium | Amazon Web Services

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM | Amazon Web Services

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips | Amazon Web Services

Reduce CAPTCHAs for AI agents browsing the web with Web Bot Auth (Preview) in Amazon Bedrock AgentCore Browser | Amazon Web Services

Scientists turn common semiconductor into a superconductor