LocoConvoy

Can multiple cheap consumer GPUs deliver real AI performance?

Load balancing

Multiple independent model instances across separate GPUs serve concurrent users with no PCIe bottleneck. Six GPUs means six simultaneous workers.

Mixture of agents

Different specialist models collaborate on a single response, each running on its own GPU — without competing for the same VRAM pool.

Speculative decoding

A small draft model proposes tokens; a larger model verifies them. Trades inter-GPU bandwidth for faster generation by reducing full-model forward passes.

PCIe reality check

Datacenter multi-GPU uses NVLink at hundreds of GB/s. Consumer PCIe tops out at ~32 GB/s. LocoConvoy measures what that penalty actually costs — and when it doesn’t matter.