Perplexity’s LLM API: Fast, User-Friendly, Cost-Efficient

by | Mar 30, 2024

Perplexity’s LLM API: Fast, User-Friendly, Cost-Efficient

Perplexity’s LLM API is a state-of-the-art solution designed to simplify and accelerate large language model (LLM) deployment and inference. It offers an out-of-the-box API that requires no deep knowledge of C++/CUDA or access to GPUs, making it user-friendly and cost-efficient. The API is built around NVIDIA’s TensorRT-LLM and served on A100 GPUs provided by AWS, resulting in fast and optimized performance. In comparison to other LLM inference libraries, Perplexity’s LLM API achieves faster overall latency and higher throughput. The API is already powering one of Perplexity’s core product features, resulting in significant cost savings. It supports various models like Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and is OpenAI client-compatible for easy integration. The team behind this product is committed to providing access to the latest state-of-the-art open-sourced LLMs and plans to support more models over time.

Perplexity
Not Applicable
March 30, 2024
Introduction to pplx-api
Perplexity Pro Subscription