Backend Comparison
Weyl supports three inference backends with different characteristics.
Overview
| Backend | Precision | Speed | Models |
|---|---|---|---|
nunchaku | FP4 | ⚡⚡⚡ | FLUX, Z-Image |
torch | FP16 | ⚡⚡ | FLUX, WAN |
tensorrt | Mixed | ⚡⚡⚡ | FLUX Dev/Schnell |
Nunchaku
NVIDIA FP4 quantization on Blackwell GB200
- Precision: FP4 (4-bit floating point)
- Speed: Fastest (3-4× faster than FP16)
- Quality: Minimal loss vs FP16
Supported Models:
- FLUX Dev2 ✓
- FLUX Dev ✓
- FLUX Schnell ✓
- Z-Image Turbo ✓
Torch
PyTorch diffusers with CUDA
- Precision: FP16 (half precision)
- Framework: diffusers + transformers
- Flexibility: Maximum flexibility
Supported Models:
- FLUX Dev2 ✓
- FLUX Dev ✓
- FLUX Schnell ✓
- WAN ✓
TensorRT
NVIDIA TensorRT-LLM with ModelOpt
- Precision: Mixed (INT8 + FP16)
- Optimization: Ahead-of-time compilation
Supported Models:
- FLUX Dev ✓
- FLUX Schnell ✓
Performance
FLUX @ 1024×1024:
| Model | Backend | Latency |
|---|---|---|
| schnell | nunchaku | 450ms |
| schnell | tensorrt | 380ms |
| dev | nunchaku | 1.8s |
| dev | tensorrt | 1.5s |