Skip to content

Core Concepts

Model Families

Each family is a distinct transformer backbone with its own characteristics.

Family	Modality	Default Backend	Status
`flux`	image	nunchaku	active
`zimage`	image	nunchaku	active
`wan`	video	torch	coming soon

Backends

Three inference backends, each with different tradeoffs:

Backend	Stack	Notes
`nunchaku`	NVFP4 on Blackwell	Fastest, 4-bit quantized
`torch`	diffusers + CUDA	Flexible, full precision
`tensorrt`	TRT-LLM + ModelOpt	NVIDIA-optimized, production

Formats

Image Formats

Format	Dimensions	Aspect
`1024`	1024×1024	1:1
`512`	512×512	1:1
`portrait`	768×1024	3:4
`landscape`	1024×768	4:3

Video Formats

Format	Dimensions	Aspect
`720p`	1280×720	16:9
`480p`	832×480	~16:9
`square`	640×640	1:1

Tasks

Task	Requires	Produces	Description
`t2v`	prompt	video	text to video
`i2v`	prompt + image	video	animate a still
`t2i`	prompt	image	text to image
`i2i`	prompt + image	image	transform / style transfer
`edit`	prompt + image + mask	image	inpaint / outpaint