LocoLab · Multi-GPU Research · No NVLink Required

LocoConvoy

Multi-GPU parallelism on consumer hardware. We're studying what happens when you connect multiple secondhand GPUs over plain PCIe and push them to work together. Load balancing, Mixture of Agents, speculative decoding — no NVLink, no datacenter, no excuses.

Read the Docs View on GitHub
Research Areas

Three strategies for consumer multi-GPU.

Datacenter GPUs have NVLink. We have PCIe and stubbornness. These are the three parallelism strategies we're measuring on real consumer hardware.

Load Balancing

Run independent model instances across separate GPUs to serve concurrent users. Each card operates alone — no inter-GPU communication needed. Six GPUs means six simultaneous inference workers, ideal for classroom or small-team deployments.

🤝

Mixture of Agents

Multiple specialist models collaborate on a single response, each running on its own GPU. Different models or quantisations contribute distinct perspectives without competing for the same VRAM pool. Consensus through diversity.

Speculative Decoding

A small draft model on one GPU proposes tokens while a larger verification model on another GPU accepts or rejects them. Trades inter-GPU bandwidth for faster generation by reducing the number of full-model forward passes.

Built from what others discarded.

Our test rigs are assembled from secondhand components — mining motherboards, ex-gaming GPUs, and repurposed chassis. The research question is what this hardware can actually do, measured rigorously rather than assumed.

🐝

Colmena

WEIHO 8-GPU Enclosed Chassis

Eight native PCIe slots in an enclosed mining chassis. RTX 2060 Super x3, RTX 3060, with RTX 3090 and RTX 4060 Ti pending. Multi-GPU architecture research: load balancing, Mixture of Agents, vLLM tensor parallelism.

8x PCIe LGA 1155 Enclosed
🍯

Colmena

WEIHO 8-GPU Chassis

A compact, purpose-built enclosure running up to eight consumer GPUs. Populated with RTX 2060 Supers — 8 GB VRAM and 448 GB/s memory bandwidth per card, sourced secondhand for roughly $150–200 AUD each.

8x GPU slots RTX 2060 Super 448 GB/s per card
The value of many cheap GPUs depends entirely on whether you need one big model or many small ones. For pooled VRAM, fewer cards in proper PCIe slots wins. For concurrent independent inference, more cards wins regardless of slot bandwidth.

Part of LocoLab.

LocoConvoy is a research project from LocoLab, the applied AI research lab at Curtin University. We study what consumer hardware can do when you stop assuming it can't.

Get in Touch LocoLab Home LocoLLM LocoBench