inference (Page 2)

Amazon Bedrock cross-Region inference capability that provides organizations with flexibility to access foundation models (FMs) across AWS Regions while maintaining optimal performance and availability. However, some enterprises implement strict Regional access controls through service control policies (SCPs) or AWS Control Tower to adhere to compliance requirements, inadvertently blocking cross-Region inferenceContinue Reading

Deploying models efficiently, reliably, and cost-effectively is a critical challenge for organizations of all sizes. As organizations increasingly deploy foundation models (FMs) and other machine learning (ML) models to production, they face challenges related to resource utilization, cost-efficiency, and maintaining high availability during updates. Amazon SageMaker AI introduced inference componentContinue Reading

This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team of GoDaddy Generative AI solutions have the potential to transform businesses by boosting productivity and improving customer experiences, and using large language models (LLMs) in these solutions has become increasingly popular. However, inference of LLMsContinue Reading

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning step, which was used to refine the model’s responses beyond the standard pre-training andContinue Reading

Recent advances in generative AI have led to the proliferation of new generation of conversational AI assistants powered by foundation models (FMs). These latency-sensitive applications enable real-time text and voice interactions, responding naturally to human conversations. Their applications span a variety of sectors, including customer service, healthcare, education, personal andContinue Reading

This is a guest post co-written with Tim Krause, Lead MLOps Architect at CONXAI. CONXAI Technology GmbH is pioneering the development of an advanced AI platform for the Architecture, Engineering, and Construction (AEC) industry. Our platform uses advanced AI to empower construction domain experts to create complex use cases efficiently.Continue Reading

In production generative AI applications, responsiveness is just as important as the intelligence behind the model. Whether it’s customer service teams handling time-sensitive inquiries or developers needing instant code suggestions, every second of delay, known as latency, can have a significant impact. As businesses increasingly use large language models (LLMs)Continue Reading

In this post, I’ll show you how to use Amazon Bedrock—with its fully managed, on-demand API—with your Amazon SageMaker trained or fine-tuned model. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta,Continue Reading