Faster LLMs with speculative decoding and AWS Inferentia2 | Amazon Web Services
2024-08-05
In recent years, we have seen a big increase in the size of large language models (LLMs) used to solve natural language processing (NLP) tasks such as question answering and text summarization. Larger models with more parameters, which are in the order of hundreds of billions at the time ofContinue Reading