Model compatibility table

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

Note

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.

Text Generation & Language Models

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
llama.cppLLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many othersyesGPT and FunctionsyesyesCUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLMVarious GPTs and quantization formatsyesGPTnonoCUDA 12, ROCm, Intel
transformersVarious GPTs and quantization formatsyesGPT, embeddings, Audio generationyesyes*CUDA 11/12, ROCm, Intel, CPU
exllama2GPTQyesGPT onlynonoCUDA 12
MLXVarious LLMsyesGPTnonoMetal (Apple Silicon)
MLX-VLMVision-Language ModelsyesMultimodal GPTnonoMetal (Apple Silicon)
langchain-huggingfaceAny text generators available on HuggingFace through APIyesGPTnonoN/A

Audio & Speech Processing

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
whisper.cppwhispernoAudio transcriptionnonoCUDA 12, ROCm, Intel SYCL, Vulkan, CPU
faster-whisperwhispernoAudio transcriptionnonoCUDA 12, ROCm, Intel, CPU
piper (binding)Any piper onnx modelnoText to voicenonoCPU
barkbarknoAudio generationnonoCUDA 12, ROCm, Intel
bark-cppbarknoAudio-OnlynonoCUDA, Metal, CPU
coquiCoqui TTSnoAudio generation and Voice cloningnonoCUDA 12, ROCm, Intel, CPU
kokoroKokoro TTSnoText-to-speechnonoCUDA 12, ROCm, Intel, CPU
chatterboxChatterbox TTSnoText-to-speechnonoCUDA 11/12, CPU
kitten-ttsKitten TTSnoText-to-speechnonoCPU
silero-vad with Golang bindingsSilero VADnoVoice Activity DetectionnonoCPU
neuttsNeuTTSAirnoText-to-speech with voice cloningnonoCUDA 12, ROCm, CPU
mlx-audioMLXnoText-tospeechnonoMetal (Apple Silicon)

Image & Video Generation

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
stablediffusion.cppstablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMakernoImagenonoCUDA 12, Intel SYCL, Vulkan, CPU
diffusersSD, various diffusion models,…noImage/Video generationnonoCUDA 11/12, ROCm, Intel, Metal, CPU
transformers-musicgenMusicGennoAudio generationnonoCUDA, CPU

Specialized AI Tasks

Backend and BindingsCompatible modelsCompletion/Chat endpointCapabilityEmbeddings supportToken stream supportAcceleration
rfdetrRF-DETRnoObject DetectionnonoCUDA 12, Intel, CPU
rerankersReranking APInoRerankingnonoCUDA 11/12, ROCm, Intel, CPU
local-storeVector databasenoVector storageyesnoCPU
huggingfaceHuggingFace API modelsyesVarious AI tasksyesyesAPI-based

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 11.7, CUDA 12.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T): ARM64 support for embedded AI
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.