Hardware guide
Which GPU + how much VRAM + which execution provider you actually need. No marketing fluff.
The four providers, ranked
| Tier | Provider | What it looks like |
|---|---|---|
| 1 | NVIDIA CUDA + TensorRT | RTX 2060 and up, ≥6 GB VRAM. Best latency, widest model support, TensorRT FP16 paths on Ampere+. |
| 2 | Intel OpenVINO | Arc A-series discrete, Iris Xe iGPU, N100/N305 mini-PCs. Great power/perf for home-lab, needs /dev/dri in container. |
| 3 | AMD ROCm | RDNA2+ (6600/6700/7000 series) on Linux hosts with /dev/kfd. Works, but fewer models have official ROCm kernels. |
| 4 | CPU (ONNX) | AVX2-capable modern x86. Works for FSRCNN/ESPCN real-time, anything ESRGAN+ will be minutes-per-frame. |
VRAM budget by model
These numbers are measured on CUDA at FP16 with a single 1080p→4K pass. Double them for 4K→8K.
| Model family | Min VRAM (FP16) | Comfort VRAM |
|---|---|---|
| FSRCNN / ESPCN | 500 MB | 1 GB |
| Waifu2x | 1.5 GB | 3 GB |
| Real-ESRGAN / RealESRGAN-Anime | 3 GB | 6 GB |
| SwinIR-M / HAT-S | 4 GB | 8 GB |
| SwinIR-L / HAT-L / AnimeSR-v2 | 7 GB | 12 GB |
| EDVR-M / RealBasicVSR (multi-frame) | 8 GB | 16 GB |
| GFPGAN + Upscaler (face restore pipeline) | +1 GB on top | +2 GB on top |
Headroom matters: if your GPU also serves Jellyfin transcoding (NVENC/QSV), keep ~2 GB free for the transcode session or you will see OOM kills on simultaneous playback.
Recommended setups
Budget home lab (Intel N100 / Arc A380)
- Provider: OpenVINO (
:latest-openvinoimage) - Good for: FSRCNN real-time, Waifu2x 2×, Real-ESRGAN-anime-x4 batch
- Config: Max Concurrent Streams = 1, cache enabled
- Expect: ~15 fps at 720p → 1440p with anime-compact-x4
Mid-range (RTX 3060 12GB / RTX 4060)
- Provider: CUDA (
:latest-cudaimage) - Good for: Everything in the catalogue except EDVR-M and HAT-L comfortably
- Config: Max Concurrent Streams = 1, Auto-Mode on
- Expect: 4×-realesrgan at ~20 fps on 720p source, 3× on 1080p source
High-end (RTX 4090 / A100)
- Provider: CUDA + TensorRT conversion
- Good for: SwinIR-L, HAT-L, EDVR-M temporal, GFPGAN pipeline
- Config: Max Concurrent Streams = 2, pre-processing cache on NVMe
- Expect: Near real-time 4× on any model
CPU-only NAS (e.g. Synology DS920+)
- Provider: CPU (
:latest-cpuimage) - Good for: Overnight batch upscales of an anime library with
fsrcnn-x2/espcn-x4 - Config:
Scan & Upscale Libraryscheduled at 3 AM, pre-processing cache on the biggest pool you have - Expect: Hours per feature-length file. This is batch, not live.
Jellyfin-side hardware acceleration
Orthogonal to the AI upscaler — Jellyfin still handles decoding the source file and encoding the final MP4 itself.
- Decoders: NVDEC, QSV, VAAPI all work unchanged.
- Encoders: the plugin exposes 12 codec options with tuned defaults (see Config → Output Codec).
- Tonemap: HDR→SDR via
tonemap_cuda/tonemap_openclhandled by Jellyfin's stream path, runs before the upscale filter.
Remote transcoding (optional)
Set Enable Remote Transcoding and the plugin installs a wrapper script that redirects Jellyfin's FFmpeg invocations to the AI service host over SSH. Use this when your Jellyfin server has no GPU but a separate desktop on the LAN does.
POST /Upscaler/wrapper/install
# generates on-disk next to the plugin DLL:
# Linux : upscale-wrapper.sh
# Win : upscale-wrapper.bat + upscale-logic.ps1
Prereq: password-less SSH from Jellyfin host to the GPU host, with
ffmpeg on the $PATH at the remote end.