← AI Face Swap in Stable Diffusion Guide Cloudflare AI Inference Tutorial →

Deploy Open-Source LLMs as APIs Guide

by Fede Nolasco | Aug 2, 2024

In this tutorial, the Data Science In Everyday Life channel demonstrates how to deploy an open-source large language model (LLM) as an API using Hugging Face and AWS. The video focuses on deploying the Dolly model, released by Databricks, which is hosted on Hugging Face. The process involves using Amazon SageMaker, a cloud-based machine learning platform, to deploy the model. The tutorial begins with creating a SageMaker notebook instance and modifying the instance type and container startup settings. After running the deployment code, the endpoint is created and verified in the SageMaker console. Next, the tutorial guides through creating a serverless Lambda function in AWS, which calls the deployed endpoint. The function code is copied and configured, including adjusting the timeout settings. The function is tested to ensure it works correctly. Finally, the tutorial shows how to create an API Gateway to connect and call the Lambda function, enabling the deployed LLM to be used as an API. The video concludes by mentioning that future videos will cover performance, costs, and latency in more detail.

 Data Science In Everyday Life

 Not Applicable

 July 7, 2024

 Blog link: Deploying Open-Source LLMs as APIs

⏳PT9M29S

← AI Face Swap in Stable Diffusion Guide Cloudflare AI Inference Tutorial →