Postmortem

How a 1200-byte ceiling on an NGO Unreliable RPC quietly capped my Wave 3

  • Unity
  • Netcode for GameObjects
  • NGO
  • Multiplayer
  • Networking
  • Game Development
  • Failure

The symptom

On the third wave of a co-op enemy survival prototype, the client suddenly stopped seeing enemies. The player still got knocked around. Damage still landed. But the screen was empty.

Host: fully populated. Client: ghosts attacking thin air.

This was the second Wave-3-flavoured failure on the same project. The first one was a full client hang at Wave 3, traced back to a stale-state bug in my SceneTransitionTracker and fixed in a different commit. This one looked different on the surface, but the Wave 3 threshold was suspiciously identical.

Setup

  • Unity 6
  • Netcode for GameObjects 2.11
  • Server-authoritative
  • Enemies stored in a Sparse Set–style ReactiveEntitySetSO on the server
  • Periodic state synced to clients at 20 Hz via a ClientRpc

The recent change in this area: I had split the snapshot path. Event-driven (late-join, end-of-wave) stayed on a reliable ClientRpc. The 20 Hz periodic stream moved to Unreliable:

[ClientRpc(Delivery = RpcDelivery.Unreliable)]
private void SyncPeriodicSnapshotClientRpc(
    EnemyState[] states, int[] entityIds, int count) { ... }

The motivation was sound: reliable retransmits at 20 Hz under packet loss can pile up into head-of-line blocking. Unreliable was the textbook choice for periodic snapshots.

It worked perfectly through Wave 2.

What the log finally told me

The host console had this, repeated:

OverflowException: RPC parameters are too large for unreliable delivery.
  Unity.Netcode.NetworkBehaviour.__endSendClientRpc (...)
  Game.Enemy.EnemyStateNetworkSync.SyncPeriodicSnapshotClientRpc

Not “your RPC failed silently.” Not “client dropped a packet.” The send itself was being refused on the server, before the data ever went out.

Why this happens

Unity’s NGO does not fragment unreliable RPCs. The behavior is documented plainly enough once you know to look for it:

Unreliable RPCs throw OverflowException if parameters exceed the non-fragmented message maximum size.

If your serialised payload doesn’t fit in a single UDP MTU window (~1400 bytes wire, roughly 1200 bytes safe payload after NGO framing), the send throws.

Reliable RPCs in NGO 1.0.1+ do fragment automatically. That’s why the same data shape happily flowed through a reliable channel on earlier waves of testing.

The math I should have done at design time

My EnemyState struct:

public struct EnemyState : INetworkSerializeByMemcpy
{
    public Vector3 Position;   // 12
    public float   RotationY;  //  4
    public short   Health;     //  2
    public short   MaxHealth;  //  2
    public byte    Flags;      //  1
    public byte    EnemyTypeId;//  1
}
// Sequential layout with 4-byte alignment → 24 bytes per enemy

Wave progression (baseEnemyCount: 17, enemiesPerWaveIncrease: 15):

WaveEnemiesApprox payload (incl. ids + framing)Unreliable MTU
117~520 B
232~940 B
347~1360 B

Wave 3 sat right on the threshold. Earlier playtests passed because they were Wave 1 and 2. Wave 3 was the first time the code path was actually exercised with the full payload.

What the industry actually does

Once the cause was clear, the interesting question was: what’s the right design? Below the line, my prototype was running headlong into a problem that every server-authoritative shooter has solved before.

DOTS Netcode for Entities — Ghost System

Unity’s other netcode product solves this natively:

  • Per-tick priority queue of ghosts (entities)
  • Server fills each outgoing packet to MTU with the highest-importance ghosts
  • Anything that didn’t fit carries over to the next tick (with bumped importance)
  • Delta compression against the last acknowledged baseline
  • Quantization on top of that (e.g. positions stored as a few bits per component)

The key architectural insight is: you don’t try to fit all entities into one packet. You design the wire format so MTU overflow is impossible by construction.

Glenn Fiedler / Gaffer on Games

In his “Snapshot Compression” series, Fiedler took a 901-cube physics scene from 17.37 Mbps toward a 256 kbps budget. The toolkit:

  • Bound and quantize each numeric field (positions to ~mm precision, velocities to ~32 values per m/s)
  • Pack quaternions as “smallest three” (largest component index + 3 quantized fractions)
  • Encode “no change since baseline” as 1 bit
  • Index entities relatively, not absolutely
  • Combine with Huffman / range coding for final compression

Overwatch and Quake

Tim Ford’s GDC 2017 talk on Overwatch describes the same shape: ECS state, server-authoritative, delta snapshots sent at a fixed tick rate, client interpolates between snapshots, mispredicts are rolled back and replayed.

The takeaway across all three: periodic streams belong on an Unreliable channel, and the wire format is designed around the per-packet budget. Reliable is reserved for events that genuinely require ordered delivery.

Options I weighed

For my prototype I considered three paths:

OptionWhat it doesCostWhen it fits
Switch back to ReliableLet NGO auto-fragment1 lineSmall scope only, risks reliable retransmit storms at 20 Hz
Quantize the state24 B → ~12 B per entitysmallDoubles the headroom
Chunk the snapshotSplit across multiple Unreliable RPCs, reassemble on clientmediumRight answer at any scale

Quantize + chunk together is the design Netcode for Entities ships natively, and what I’d reach for if the project needed 100+ entities.

What I actually shipped

This is a prototype with a fixed 3-wave structure. Scope-honest decision: I lowered the wave config so the worst case never crosses MTU.

# WaveConfig.asset
baseEnemyCount: 15        # was 17
enemiesPerWaveIncrease: 10 # was 15
maxWaves: 3                # unchanged
maxEnemyCount: 40          # was 200

Wave progression: 15 → 25 → 35.

Worst-case payload at 35 enemies × 24 B + 35 × 4 B + framing ≈ 1020 B. Comfortably under MTU, on Unreliable, no quantization required.

If the project grows — more enemies per wave, more state per enemy, longer matches — quantization and chunking are sitting on the shelf. But shipping the simplest fix that respects the actual scope was the right call here.

Lessons that survive the project

Five things I’m carrying forward:

  1. Compute worst-case payload at design time, not after shipping. Any RPC with an array parameter needs a one-line comment with the max-N byte count.
  2. Unreliable means ≤ ~1200 B per call. No exceptions. Treat it as a hard ceiling, like a stack frame size.
  3. Reliable is for low-frequency events. A 20 Hz reliable RPC is an anti-pattern, even if it currently works.
  4. Periodic state belongs on Unreliable + chunking. That’s what Netcode for Entities does because it’s correct, not because it’s clever.
  5. Caching network state locally is a bug source. The earlier Wave 3 hang on this same project came from cached scene state. The fix was to query the authoritative system on demand. Same lesson, different layer.

Wrap-up

The investigation also pushed me to write down the networking rules as a reusable skill in my personal standards repo — payload size budget, mechanism decision tree, anti-patterns, snapshot streaming pattern. The article-level version of that is essentially this post; the rule-level version is a Claude Code skill that fires whenever I write NGO code in any future project.

The point of writing both: failure once is tuition. Failure twice on the same shape is negligence.

References