AI Workstation Glossary: Hardware Terms in Plain English
Buying an AI workstation means wading through acronyms — VRAM, TDP, quantization, NVLink — that mostly hide simple ideas. This is our plain-English glossary of the terms that actually matter when you spec a machine, with no hype and no fake precision. Each definition is one or two sentences, and where a term deserves a deeper answer we link to the guide that explains it.
The engine: GPU, VRAM and speed
The handful of terms that decide what an AI workstation can run, and how fast. If you read only one group, read this one — and our GPU and VRAM guide goes deeper.
- VRAM (video memory)
- The GPU's own fast memory. It must be large enough to hold the AI model you run, or the model won't fit and everything slows drastically — which is why VRAM, not the headline GPU name, is the spec to size first. See the GPU and VRAM guide.
- GPU (graphics processing unit)
- The chip that does the heavy parallel math for AI, and the single most important part of an AI workstation. Card choice is the biggest decision in the build — our NVIDIA AI workstation page covers the current tiers.
- Tensor Core
- Specialized units inside NVIDIA GPUs that accelerate the matrix math behind AI training and inference. More and newer tensor cores generally mean faster AI work.
- CUDA
- NVIDIA's software platform that lets AI frameworks use the GPU. Almost all local AI tooling expects it, which is a big reason AI workstations are built around NVIDIA cards.
- Memory bandwidth
- How fast data moves in and out of VRAM. It's a major driver of inference speed and is separate from how much VRAM you have — a card can have plenty of memory but still feel slow if its bandwidth is low.
Workloads: what the machine does
The kinds of AI work you'll run, and the memory-saving tricks that decide which card you need.
- Inference
- Running a trained model to get answers — for example, chatting with a local LLM. It's lighter on hardware than training, so most single-user inference fits comfortably on one card.
- Fine-tuning
- Adapting an existing model to your own data. It needs more memory than inference but far less than training a model from scratch, which keeps it within reach of a desk-side machine.
- LoRA
- A memory-efficient fine-tuning method that trains small add-on layers instead of the whole model. It lets you customize a model without the VRAM a full fine-tune would demand.
- QLoRA
- LoRA combined with quantization. It roughly halves the VRAM of LoRA, making fine-tuning possible on smaller cards — the reason a single high-VRAM card can fine-tune models you'd expect to need a server.
- Quantization
- Compressing a model's numbers — for example to 4-bit — so it uses less VRAM, with a small quality trade-off. It's how a large model squeezes onto a card that couldn't hold it at full precision.
- Parameters (e.g. 7B, 70B)
- The size of a model in billions of weights. Bigger models are more capable but need more memory, so parameter count is the first thing to map against your VRAM.
- Token / tokens per second
- A token is a chunk of text — roughly a short word or part of one. Tokens per second measures how fast a model generates output, and it depends heavily on the model, quantization and context length, so treat any single number as conditional.
- Context window
- How much text a model can consider at once. Larger contexts let a model see more of a document or conversation, but they use more VRAM during inference.
- KV cache
- Memory a model uses to track an ongoing conversation. It grows with context length and adds to VRAM use, which is why a busy machine with long chats needs headroom beyond the model weights alone.
Platform: lanes, memory and storage
The supporting parts that keep the GPU fed — and that matter most once you consider a second card.
- PCIe lanes
- High-speed connections between the CPU and devices like GPUs and NVMe drives. More lanes let multiple GPUs run at full speed, which is why a multi-GPU build often needs a higher-end platform — see the multi-GPU guide.
- ECC memory
- Error-correcting RAM or VRAM that catches and fixes bit errors. It's valued for long, unattended AI runs where a single flipped bit could quietly corrupt a result, and it's one of the things that separates pro-grade cards from consumer ones.
- NVMe SSD
- Very fast storage that connects over PCIe. It's ideal scratch space for datasets and model checkpoints, so I/O never becomes the thing your GPU is waiting on.
Multiple GPUs
The terms that come up when you weigh a second card — and where the common myths live. The multi-GPU guide sets honest expectations.
- NVLink
- A fast direct link between certain NVIDIA GPUs that can pool their memory so two cards act more like one. It's largely absent from current consumer cards, which is why two modern GPUs usually don't simply combine their VRAM.
- Tensor parallelism
- Splitting one model across multiple GPUs so it runs even when it won't fit on a single card. With NVLink gone from consumer cards, this software approach over PCIe is how most multi-GPU workstations run a model bigger than one card holds.
Power, heat and cooling
AI load runs a GPU flat-out for minutes or hours, so these terms decide whether a machine stays cool and quiet at your desk — more in the cooling and noise guide.
- TDP / TGP (watts)
- The power a chip or GPU draws at full load. It drives power-supply sizing, heat and noise — a 600W card, for instance, needs both real PSU headroom and a healthy circuit to run it.
- Thermal throttling
- When a hot GPU automatically slows itself down to avoid damage. It's a sign of inadequate cooling — a hot card is a slow card, which is why sustained-load thermals matter more for AI than gaming.
- Blower vs. flow-through cooler
- Two GPU cooler styles. Blower cards exhaust heat out the back, which is good for packing several together; flow-through dual-fan cards cool better on their own but dump heat into the case — the right choice depends on how many cards share the chassis.
- Max-Q
- A lower-power version of a pro GPU designed to fit several cards in one chassis without overheating. It trades some single-card speed for the density a multi-GPU build needs.
Workstation basics and build quality
The terms that define what a workstation is — and how a good one is proven before it reaches your desk.
- Workstation
- A single-user, desk-side computer built for heavy work, quiet enough for an office — unlike a rack server, which is shared by a team and lives in a closet. The AI workstations overview walks through what that means in practice.
- Burn-in test
- Running a finished build under heavy load before delivery to catch faults early and confirm thermals. It's how we make sure a machine that passes on the bench keeps passing on your desk.
Where to go next
Now that the terms make sense, these guides put them to work:
- Start with the GPU and VRAM guide — how to size the most important spec to your largest model.
- Weighing a second card? Read the multi-GPU guide on when it helps and when it doesn't.
- Worried about heat or noise at a desk? See the cooling and noise guide.
- Ready for hardware? Configure an NVIDIA AI workstation or browse the AI workstations overview.
- Not sure a desktop is enough? When a team needs shared compute, that's an AI server conversation.
We translate the jargon into a machine you own
You don't need to master every term on this page — that's our job. Tell us the work, and we'll spec the VRAM, the cooling and the platform, then hand-build and bench-test the machine here in Texas and set it up in person from Katy to Fulshear. See our Texas service areas.
Still decoding the spec sheet?
Skip the acronyms — tell us what you want to run and we'll spec a Texas-built workstation you own outright, in plain English.