evaluation

How PropHero built an intelligent property investment advisor with continuous evaluation using Amazon Bedrock | Amazon Web Services

This post was written with Lucas Dahan, Dil Dolkun, and Mathew Ng from PropHero. PropHero is a leading property wealth management service that democratizes access to intelligent property investment advice through big data, AI, and machine learning (ML). For the Spanish and Australian consumer base, PropHero needed an AI-powered advisoryContinue Reading

Effective cross-lingual LLM evaluation with Amazon Bedrock | Amazon Web Services

In: Artificial Intelligence

Evaluating the quality of AI responses across multiple languages presents significant challenges for organizations deploying generative AI solutions globally. How can you maintain consistent performance when human evaluations require substantial resources, especially across diverse languages? Many companies find themselves struggling to scale their evaluation processes without compromising quality or breakingContinue Reading

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools | Amazon Web Services

In: Artificial Intelligence

Today we are excited to introduce the Text Ranking and Question and Answer UI templates to SageMaker AI customers. The Text Ranking template enables human annotators to rank multiple responses from a large language model (LLM) based on custom criteria, such as relevance, clarity, or factual accuracy. This ranked feedbackContinue Reading

Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation | Amazon Web Services

In: Artificial Intelligence

In the media and entertainment industry, understanding and predicting the effectiveness of marketing campaigns is crucial for success. Marketing campaigns are the driving force behind successful businesses, playing a pivotal role in attracting new customers, retaining existing ones, and ultimately boosting revenue. However, launching a campaign isn’t enough; to maximizeContinue Reading

Accuracy evaluation framework for Amazon Q Business – Part 2 | Amazon Web Services

In: Artificial Intelligence

In the first post of this series, we introduced a comprehensive evaluation framework for Amazon Q Business, a fully managed Retrieval Augmented Generation (RAG) solution that uses your company’s proprietary data without the complexity of managing large language models (LLMs). The first post focused on selecting appropriate use cases, preparingContinue Reading

Build an automated generative AI solution evaluation pipeline with Amazon Nova | Amazon Web Services

In: Artificial Intelligence

Large language models (LLMs) have become integral to numerous applications across industries, ranging from enhanced customer interactions to automated business processes. Deploying these models in real-world scenarios presents significant challenges, particularly in ensuring accuracy, fairness, relevance, and mitigating hallucinations. Thorough evaluation of the performance and outputs of these models isContinue Reading

Advanced tracing and evaluation of generative AI agents using LangChain and Amazon SageMaker AI MLFlow | Amazon Web Services

In: Artificial Intelligence

Developing generative AI agents that can tackle real-world tasks is complex, and building production-grade agentic applications requires integrating agents with additional tools such as user interfaces, evaluation frameworks, and continuous improvement mechanisms. Developers often find themselves grappling with unpredictable behaviors, intricate workflows, and a web of complex interactions. The experimentationContinue Reading

Evaluating RAG applications with Amazon Bedrock knowledge base evaluation | Amazon Web Services

In: Artificial Intelligence

Organizations building and deploying AI applications, particularly those using large language models (LLMs) with Retrieval Augmented Generation (RAG) systems, face a significant challenge: how to evaluate AI outputs effectively throughout the application lifecycle. As these AI technologies become more sophisticated and widely adopted, maintaining consistent quality and performance becomes increasinglyContinue Reading

LLM-as-a-judge on Amazon Bedrock Model Evaluation | Amazon Web Services

In: Artificial Intelligence

The evaluation of large language model (LLM) performance, particularly in response to a variety of prompts, is crucial for organizations aiming to harness the full potential of this rapidly evolving technology. The introduction of an LLM-as-a-judge framework represents a significant step forward in simplifying and streamlining the model evaluation process.Continue Reading

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval | Amazon Web Services

In: Artificial Intelligence

Evaluating large language models (LLMs) is crucial as LLM-based systems become increasingly powerful and relevant in our society. Rigorous testing allows us to understand an LLM’s capabilities, limitations, and potential biases, and provide actionable feedback to identify and mitigate risk. Furthermore, evaluation processes are important not only for LLMs, butContinue Reading

evaluation

How PropHero built an intelligent property investment advisor with continuous evaluation using Amazon Bedrock | Amazon Web Services

Effective cross-lingual LLM evaluation with Amazon Bedrock | Amazon Web Services

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools | Amazon Web Services

Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation | Amazon Web Services

Accuracy evaluation framework for Amazon Q Business – Part 2 | Amazon Web Services

Build an automated generative AI solution evaluation pipeline with Amazon Nova | Amazon Web Services

Advanced tracing and evaluation of generative AI agents using LangChain and Amazon SageMaker AI MLFlow | Amazon Web Services

Evaluating RAG applications with Amazon Bedrock knowledge base evaluation | Amazon Web Services

LLM-as-a-judge on Amazon Bedrock Model Evaluation | Amazon Web Services

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval | Amazon Web Services

Scientists finally found the “dark matter” of electronics

A tiny detector could unveil gravitational waves we’ve never seen before