I work with Metal on Apple Silicon in my day job, and the experience of Unified Memory has been revealing. No explicit transfers between CPU and GPU. No SetData / GetData choreography. Just… shared memory.
This made me think: what if this became the norm everywhere? And how does it relate to data-oriented design patterns like ECS?
My Experience with Metal
Working with Metal on Apple Silicon, I’ve come to appreciate what Unified Memory enables:
// Metal - storageModeShared means no transfer
let buffer = device.makeBuffer(
bytes: &states,
length: bufferLength,
options: .storageModeShared
)
The development experience feels quite smooth. No thinking about transfers, no synchronization headaches. CPU writes, GPU reads — same memory.
This reduced a lot of the cognitive overhead I used to deal with. And it made me wonder: what if all platforms worked this way?
The Current Landscape
Looking at today’s gaming platforms:
| Platform | Memory | Notes |
|---|---|---|
| Apple Silicon | 16-128GB LPDDR5X | True Unified Memory (WWDC20) |
| PS5 | 16GB GDDR6 | Shared pool, 448 GB/s bandwidth |
| Xbox Series X | 16GB GDDR6 | Split bandwidth (10GB + 6GB) |
| Switch 2 | 12GB LPDDR5X | NVIDIA T239 SoC |
| PC (Gaming) | Separated | dGPU with dedicated VRAM |
Apple Silicon is the clearest example of true unified memory — officially documented as CPU and GPU sharing the same memory pool.
For consoles, the situation is less clear-cut. They use shared GDDR6 pools, which eliminates PCIe transfer overhead, but whether this qualifies as “true” unified memory depends on how strictly you define the term. Either way, they’re closer to unified than traditional PC architecture.
The Traditional PC Architecture
flowchart LR
CPU[CPU Memory] <-->|PCIe Transfer| GPU[GPU Memory]
This separation creates friction:
- Explicit data transfers required
- Transfer can become a bottleneck
- Programming complexity increases
Anyone who’s worked with compute shaders knows this — you’re constantly thinking about when and what to transfer.
Why This Matters for Architecture Design
Whether you’re using:
- Unity DOTS (Archetype storage)
- EnTT (Sparse Set)
- Entitas (Group-based)
They all share one thing: contiguous memory layout.
flowchart TB
subgraph Scattered[Scattered]
direction LR
O1[Obj 1] ~~~ O2[Obj 2] ~~~ O3[Obj 3]
end
subgraph Contiguous[Contiguous]
direction LR
Data[D1, D2, D3, ...]
end
In a Unified Memory world:
- Contiguous data can be accessed efficiently by both CPU and GPU
- No marshalling or conversion needed
- Cache-friendly for CPU, transfer-friendly (or transfer-free) for GPU
This isn’t a new insight — it’s why ECS architectures exist. But Unified Memory could make these patterns even more valuable.
What I’d Like to See
It would be nice if Unified Memory became more widespread across platforms.
Currently:
- Apple Silicon has true unified memory
- Consoles have shared memory pools (which helps)
- PC gaming still relies on separated memory
If unified memory spread further — whether through better APUs, new interconnect technologies, or something else — the benefits of data-oriented design would apply even more broadly. That’s a future I’d find exciting.
Some principles that seem relevant regardless:
- Prefer contiguous data layouts — Whether Archetype or Sparse Set, packed data wins
- Use unmanaged types —
struct, no references, blittable data - Think about access patterns — Sequential access is still faster than random
These aren’t new ideas — they’re the same principles that make DOTS fast. They’re valuable today for cache efficiency, and could become even more valuable if unified memory spreads further.
Summary
The gaming landscape varies:
- Apple Silicon — True unified memory, officially documented
- Consoles (PS5, Xbox, Switch) — Shared memory pools, details vary
- PC Gaming — Still separated, likely to remain so for a while
It would be nice to see unified memory become more common. My experience with Metal has shown me how much simpler development can be when you don’t have to think about CPU/GPU transfers. Whether that future arrives more broadly remains to be seen.
In the meantime, designing with contiguous memory layouts is valuable regardless — for cache efficiency today, and potentially for unified memory tomorrow.
References
- WWDC20: Apple Silicon Architecture — Official Apple documentation on unified memory
- MLX Unified Memory