inference

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference | Amazon Web Services

Multimodal fine-tuning represents a powerful approach for customizing vision large language models (LLMs) to excel at specific tasks that involve both visual and textual information. Although base multimodal models offer impressive general capabilities, they often fall short when faced with specialized visual tasks, domain-specific content, or output formatting requirements. Fine-tuningContinue Reading

Implement automated monitoring for Amazon Bedrock batch inference | Amazon Web Services

In: Artificial Intelligence

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API, along with capabilities to build generative AI applications with security, privacy, and responsible AI. Batch inference in Amazon Bedrock is for larger workloads where immediate responsesContinue Reading

Unlock global AI inference scalability using new global cross-Region inference on Amazon Bedrock with Anthropic’s Claude Sonnet 4.5 | Amazon Web Services

In: Artificial Intelligence

Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline operations, and drive innovation. As generative AI workloads continue to grow in scale and importance, organizations face new challenges in maintaining consistent performance, reliability, and availability of their AI-powered applications. Customers are looking to scaleContinue Reading

Modernize fraud prevention: GraphStorm v0.5 for real-time inference | Amazon Web Services

In: Artificial Intelligence

Fraud continues to cause significant financial damage globally, with U.S. consumers alone losing $12.5 billion in 2024—a 25% increase from the previous year according to the Federal Trade Commission. This surge stems not from more frequent attacks, but from fraudsters’ increasing sophistication. As fraudulent activities become more complex and interconnected, conventionalContinue Reading

Monitor Amazon Bedrock batch inference using Amazon CloudWatch metrics | Amazon Web Services

In: Artificial Intelligence

As organizations scale their use of generative AI, many workloads require cost-efficient, bulk processing rather than real-time responses. Amazon Bedrock batch inference addresses this need by enabling large datasets to be processed in bulk with predictable performance—at 50% lower cost than on-demand inference. This makes it ideal for tasks suchContinue Reading

Optimizing Salesforce’s model endpoints with Amazon SageMaker AI inference components | Amazon Web Services

In: Artificial Intelligence

This post is a joint collaboration between Salesforce and AWS and is being cross-published on both the Salesforce Engineering Blog and the AWS Machine Learning Blog. The Salesforce AI Platform Model Serving team is dedicated to developing and managing services that power large language models (LLMs) and other AI workloadsContinue Reading

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM | Amazon Web Services

In: Artificial Intelligence

At Amazon, our team builds Rufus, a generative AI-powered shopping assistant that serves millions of customers at immense scale. However, deploying Rufus at scale introduces significant challenges that must be carefully navigated. Rufus is powered by a custom-built large language model (LLM). As the model’s complexity increased, we prioritized developingContinue Reading

Cost tracking multi-tenant model inference on Amazon Bedrock | Amazon Web Services

In: Artificial Intelligence

Organizations serving multiple tenants through AI applications face a common challenge: how to track, analyze, and optimize model usage across different customer segments. Although Amazon Bedrock provides powerful foundation models (FMs) through its Converse API, the true business value emerges when you can connect model interactions to specific tenants, users,Continue Reading

Manage multi-tenant Amazon Bedrock costs using application inference profiles | Amazon Web Services

In: Artificial Intelligence

Successful generative AI software as a service (SaaS) systems require a balance between service scalability and cost management. This becomes critical when building a multi-tenant generative AI service designed to serve a large, diverse customer base while maintaining rigorous cost controls and comprehensive usage monitoring. Traditional cost management approaches forContinue Reading

Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS | Amazon Web Services

In: Artificial Intelligence

This post is co-written with Kshitiz Gupta, Wenhan Tan, Arun Raman, Jiahong Liu, and Eiluth Triana Isaza from NVIDIA. As large language models (LLMs) and generative AI applications become increasingly prevalent, the demand for efficient, scalable, and low-latency inference solutions has grown. Traditional inference systems often struggle to meet theseContinue Reading

inference

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference | Amazon Web Services

Implement automated monitoring for Amazon Bedrock batch inference | Amazon Web Services

Unlock global AI inference scalability using new global cross-Region inference on Amazon Bedrock with Anthropic’s Claude Sonnet 4.5 | Amazon Web Services

Modernize fraud prevention: GraphStorm v0.5 for real-time inference | Amazon Web Services

Monitor Amazon Bedrock batch inference using Amazon CloudWatch metrics | Amazon Web Services

Optimizing Salesforce’s model endpoints with Amazon SageMaker AI inference components | Amazon Web Services

How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM | Amazon Web Services

Cost tracking multi-tenant model inference on Amazon Bedrock | Amazon Web Services

Manage multi-tenant Amazon Bedrock costs using application inference profiles | Amazon Web Services

Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS | Amazon Web Services

James Webb spots a cosmic moon factory 625 light-years away

Physicists capture trillion degree heat from the Big Bang’s primordial plasma