inference (Page 2)

This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA.  At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. These advancements build upon our collaboration withContinue Reading

The new efficient multi-adapter inference feature of Amazon SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. Multi-adapter inference handles the registration of fine-tuned adapters withContinue Reading

Generative AI models have seen tremendous growth, offering cutting-edge solutions for text generation, summarization, code generation, and question answering. Despite their versatility, these models often struggle when applied to niche or domain-specific tasks because their pre-training is typically based on large, generalized datasets. To address these gaps and maximize theirContinue Reading

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. In these scenarios, customized model monitoring for nearContinue Reading

This post is co-written with Eliuth Triana, Abhishek Sawarkar, Jiahong Liu, Kshitiz Gupta, JR Morgan and Deepika Padmanabhan from NVIDIA.  At the 2024 NVIDIA GTC conference, we announced support for NVIDIA NIM Inference Microservices in Amazon SageMaker Inference. This integration allows you to deploy industry-leading large language models (LLMs) on SageMakerContinue Reading