LLMasajudge

Evaluating generative AI models with Amazon Nova LLM-as-a-Judge on Amazon SageMaker AI | Amazon Web Services

Evaluating the performance of large language models (LLMs) goes beyond statistical metrics like perplexity or bilingual evaluation understudy (BLEU) scores. For most real-world generative AI scenarios, it’s crucial to understand whether a model is producing better outputs than a baseline or an earlier iteration. This is especially important for applicationsContinue Reading

Evaluate Amazon Bedrock Agents with Ragas and LLM-as-a-judge | Amazon Web Services

In: Artificial Intelligence

AI agents are quickly becoming an integral part of customer workflows across industries by automating complex tasks, enhancing decision-making, and streamlining operations. However, the adoption of AI agents in production systems requires scalable evaluation pipelines. Robust agent evaluation enables you to gauge how well an agent is performing certain actionsContinue Reading

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS | Amazon Web Services

In: Artificial Intelligence

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning. Part 2 introduced RAG, whichContinue Reading

LLM-as-a-judge on Amazon Bedrock Model Evaluation | Amazon Web Services

In: Artificial Intelligence

The evaluation of large language model (LLM) performance, particularly in response to a variety of prompts, is crucial for organizations aiming to harness the full potential of this rapidly evolving technology. The introduction of an LLM-as-a-judge framework represents a significant step forward in simplifying and streamlining the model evaluation process.Continue Reading

LLMasajudge

Evaluating generative AI models with Amazon Nova LLM-as-a-Judge on Amazon SageMaker AI | Amazon Web Services

Evaluate Amazon Bedrock Agents with Ragas and LLM-as-a-judge | Amazon Web Services

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS | Amazon Web Services

LLM-as-a-judge on Amazon Bedrock Model Evaluation | Amazon Web Services

Build an intelligent eDiscovery solution using Amazon Bedrock Agents | Amazon Web Services

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations | Amazon Web Services