inference (Page 2)

Enable Amazon Bedrock cross-Region inference in multi-account environments | Amazon Web Services

Amazon Bedrock cross-Region inference capability that provides organizations with flexibility to access foundation models (FMs) across AWS Regions while maintaining optimal performance and availability. However, some enterprises implement strict Regional access controls through service control policies (SCPs) or AWS Control Tower to adhere to compliance requirements, inadvertently blocking cross-Region inferenceContinue Reading

Enhance deployment guardrails with inference component rolling updates for Amazon SageMaker AI inference | Amazon Web Services

In: Artificial Intelligence

Deploying models efficiently, reliably, and cost-effectively is a critical challenge for organizations of all sizes. As organizations increasingly deploy foundation models (FMs) and other machine learning (ML) models to production, they face challenges related to resource utilization, cost-efficiency, and maintaining high availability during updates. Amazon SageMaker AI introduced inference componentContinue Reading

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock | Amazon Web Services

In: Artificial Intelligence

This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team of GoDaddy Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) in these solutions has become increasingly popular. However, inference of LLMsContinue Reading

Deploy DeepSeek-R1 distilled models on Amazon SageMaker using a Large Model Inference container | Amazon Web Services

In: Artificial Intelligence

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning step, which was used to refine the model’s responses beyond the standard pre-training andContinue Reading

Reduce conversational AI response time through inference at the edge with AWS Local Zones | Amazon Web Services

In: Artificial Intelligence

Recent advances in generative AI have led to the proliferation of new generation of conversational AI assistants powered by foundation models (FMs). These latency-sensitive applications enable real-time text and voice interactions, responding naturally to human conversations. Their applications span a variety of sectors, including customer service, healthcare, education, personal andContinue Reading

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI | Amazon Web Services

In: Artificial Intelligence

This blog post is co-written with Moran beladev, Manos Stergiadis, and Ilya Gusev from Booking.com. Large language models (LLMs) have revolutionized the field of natural language processing with their ability to understand and generate humanlike text. Trained on broad, generic datasets spanning a wide range of topics and domains, LLMsContinue Reading

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS | Amazon Web Services

In: Artificial Intelligence

This is a guest post co-written with Tim Krause, Lead MLOps Architect at CONXAI. CONXAI Technology GmbH is pioneering the development of an advanced AI platform for the Architecture, Engineering, and Construction (AEC) industry. Our platform uses advanced AI to empower construction domain experts to create complex use cases efficiently.Continue Reading

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference | Amazon Web Services

In: Artificial Intelligence

In production generative AI applications, responsiveness is just as important as the intelligence behind the model. Whether it’s customer service teams handling time-sensitive inquiries or developers needing instant code suggestions, every second of delay, known as latency, can have a significant impact. As businesses increasingly use large language models (LLMs)Continue Reading

Create a SageMaker inference endpoint with custom model & extended container | Amazon Web Services

In: Artificial Intelligence

Amazon SageMaker provides a seamless experience for building, training, and deploying machine learning (ML) models at scale. Although SageMaker offers a wide range of built-in algorithms and pre-trained models through Amazon SageMaker JumpStart, there are scenarios where you might need to bring your own custom model or use specific softwareContinue Reading

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model | Amazon Web Services

In: Artificial Intelligence

In this post, I’ll show you how to use Amazon Bedrock—with its fully managed, on-demand API—with your Amazon SageMaker trained or fine-tuned model. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta,Continue Reading

inference (Page 2)

Enable Amazon Bedrock cross-Region inference in multi-account environments | Amazon Web Services

Enhance deployment guardrails with inference component rolling updates for Amazon SageMaker AI inference | Amazon Web Services

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock | Amazon Web Services

Deploy DeepSeek-R1 distilled models on Amazon SageMaker using a Large Model Inference container | Amazon Web Services

Reduce conversational AI response time through inference at the edge with AWS Local Zones | Amazon Web Services

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI | Amazon Web Services

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS | Amazon Web Services

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference | Amazon Web Services

Create a SageMaker inference endpoint with custom model & extended container | Amazon Web Services

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model | Amazon Web Services

Build an intelligent eDiscovery solution using Amazon Bedrock Agents | Amazon Web Services

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations | Amazon Web Services