← Claude 3 Family of Models Explore Cohere’s RAG Toolkit →

Perplexity’s LLM API: Fast, User-Friendly, Cost-Efficient

by Fede Nolasco | Mar 30, 2024

Perplexity’s LLM API is a state-of-the-art solution designed to simplify and accelerate large language model (LLM) deployment and inference. It offers an out-of-the-box API that requires no deep knowledge of C++/CUDA or access to GPUs, making it user-friendly and cost-efficient. The API is built around NVIDIA’s TensorRT-LLM and served on A100 GPUs provided by AWS, resulting in fast and optimized performance. In comparison to other LLM inference libraries, Perplexity’s LLM API achieves faster overall latency and higher throughput. The API is already powering one of Perplexity’s core product features, resulting in significant cost savings. It supports various models like Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and is OpenAI client-compatible for easy integration. The team behind this product is committed to providing access to the latest state-of-the-art open-sourced LLMs and plans to support more models over time.

 Perplexity

 Not Applicable

 March 30, 2024

 Introduction to pplx-api

 Perplexity Pro Subscription

← Claude 3 Family of Models Explore Cohere’s RAG Toolkit →