inference (Page 3)

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock | Amazon Web Services

This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team of GoDaddy Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) in these solutions has become increasingly popular. However, inference of LLMsContinue Reading

Deploy DeepSeek-R1 distilled models on Amazon SageMaker using a Large Model Inference container | Amazon Web Services

In: Artificial Intelligence

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning step, which was used to refine the model’s responses beyond the standard pre-training andContinue Reading

Reduce conversational AI response time through inference at the edge with AWS Local Zones | Amazon Web Services

In: Artificial Intelligence

Recent advances in generative AI have led to the proliferation of new generation of conversational AI assistants powered by foundation models (FMs). These latency-sensitive applications enable real-time text and voice interactions, responding naturally to human conversations. Their applications span a variety of sectors, including customer service, healthcare, education, personal andContinue Reading

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI | Amazon Web Services

In: Artificial Intelligence

This blog post is co-written with Moran beladev, Manos Stergiadis, and Ilya Gusev from Booking.com. Large language models (LLMs) have revolutionized the field of natural language processing with their ability to understand and generate humanlike text. Trained on broad, generic datasets spanning a wide range of topics and domains, LLMsContinue Reading

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS | Amazon Web Services

In: Artificial Intelligence

This is a guest post co-written with Tim Krause, Lead MLOps Architect at CONXAI. CONXAI Technology GmbH is pioneering the development of an advanced AI platform for the Architecture, Engineering, and Construction (AEC) industry. Our platform uses advanced AI to empower construction domain experts to create complex use cases efficiently.Continue Reading

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference | Amazon Web Services

In: Artificial Intelligence

In production generative AI applications, responsiveness is just as important as the intelligence behind the model. Whether it’s customer service teams handling time-sensitive inquiries or developers needing instant code suggestions, every second of delay, known as latency, can have a significant impact. As businesses increasingly use large language models (LLMs)Continue Reading

Create a SageMaker inference endpoint with custom model & extended container | Amazon Web Services

In: Artificial Intelligence

Amazon SageMaker provides a seamless experience for building, training, and deploying machine learning (ML) models at scale. Although SageMaker offers a wide range of built-in algorithms and pre-trained models through Amazon SageMaker JumpStart, there are scenarios where you might need to bring your own custom model or use specific softwareContinue Reading

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model | Amazon Web Services

In: Artificial Intelligence

In this post, I’ll show you how to use Amazon Bedrock—with its fully managed, on-demand API—with your Amazon SageMaker trained or fine-tuned model. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta,Continue Reading

Amazon SageMaker launches the updated inference optimization toolkit for generative AI | Amazon Web Services

In: Artificial Intelligence

Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster. These updates build on the capabilities introduced in the original launch of the inference optimization toolkit (to learn more, see Achieve up toContinue Reading

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2 | Amazon Web Services

In: Artificial Intelligence

In Part 1 of this series, we introduced Amazon SageMaker Fast Model Loader, a new capability in Amazon SageMaker that significantly reduces the time required to deploy and scale large language models (LLMs) for inference. We discussed how this innovation addresses one of the major bottlenecks in LLM deployment: the timeContinue Reading

inference (Page 3)

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock | Amazon Web Services

Deploy DeepSeek-R1 distilled models on Amazon SageMaker using a Large Model Inference container | Amazon Web Services

Reduce conversational AI response time through inference at the edge with AWS Local Zones | Amazon Web Services

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI | Amazon Web Services

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS | Amazon Web Services

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference | Amazon Web Services

Create a SageMaker inference endpoint with custom model & extended container | Amazon Web Services

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model | Amazon Web Services

Amazon SageMaker launches the updated inference optimization toolkit for generative AI | Amazon Web Services

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2 | Amazon Web Services

Scientists shocked by reversed electric field around Earth

Those Halloween fireballs might be more dangerous than you think