Core Bluetooth in production: talking to flaky BLE hardware

The Bluetooth demo on the engineer's desk always works. Phone and device a foot apart, fresh batteries, no microwave running, no elevator. Ship that exact code to ten thousand pockets and it falls apart within a day — because BLE is a radio, and the happy path you tested is the one situation real users almost never sit in.

I've shipped a handful of apps that talk to physical hardware over Bluetooth Low Energy — fitness sensors, a smart lock, a couple of industrial gadgets. They have nothing in common except the thing that breaks: the link. Core Bluetooth is a genuinely good framework, but it hands you a connection-oriented API over a medium that is fundamentally lossy, and if you write to it as though the connection is a socket that stays up, your users will hate you. Everything below is about treating the radio as a thing that will fail, and building so that failure is boring instead of catastrophic.

The lifecycle is a chain of async hops

The first mistake everyone makes — me included, years ago — is to instantiate a CBCentralManager and immediately call scanForPeripherals. It does nothing. Bluetooth might be off, the user might not have granted permission yet, the radio might still be spinning up. The manager has a state, and you are not allowed to do anything useful until it is .poweredOn. That transition arrives asynchronously, in a delegate callback, and it's the first link in a chain where every single step is its own async hop:

Create the manager → wait for centralManagerDidUpdateState to report .poweredOn.
Scan for peripherals advertising the service you care about → each discovery arrives in didDiscover.
Connect to the one you want → success in didConnect, or failure in didFailToConnect.
Discover services on the connected peripheral → didDiscoverServices.
Discover characteristics within each service → didDiscoverCharacteristicsFor.
Subscribe (set notify) or read/write → values land in didUpdateValueFor.

There is no synchronous version of any of this, and there is no shortcut. You cannot discover characteristics before services, and you cannot read a characteristic you haven't discovered. I model it as an explicit state machine rather than a pile of booleans, because the moment you start tracking "are we connecting OR discovering OR subscribed" with separate flags, you've already lost. One enum, one source of truth, and every delegate callback is a transition.

Filter at the scan, not after

Always pass a service UUID to scanForPeripherals(withServices:). Scanning with nil returns every advertising device in radio range — dozens of them on a city street — and it keeps the radio hot, which drains battery and gets you flagged in energy reports. Scan for the one service you speak, and stop scanning the instant you've found your target.

Background operation and state restoration

Here's the scenario the demo never covers: the user connects to your hardware, then locks the phone and puts it in a pocket for an hour. iOS will suspend and very likely terminate your app to reclaim memory. Without help, the connection dies with it, and the user pulls the phone out to find a spinner. State restoration is how Core Bluetooth lets the system relaunch your app in the background — directly into the connection it was holding on your behalf — when something happens, like the peripheral sending a notification.

Two pieces wire it up. First, you opt in by giving the manager a restoration identifier at init, and you declare the bluetooth-central background mode in the Info.plist. Second, you implement willRestoreState, which iOS calls before didUpdateState on relaunch, handing you back the peripherals you were connected to. You re-grab those objects, re-assign delegates, and pick the state machine back up — you do not start scanning from scratch.

let options: [String: Any] = [
    CBCentralManagerOptionRestoreIdentifierKey: "com.app.central"
]
manager = CBCentralManager(delegate: self, queue: bleQueue, options: options)

func centralManager(_ central: CBCentralManager,
                    willRestoreState dict: [String: Any]) {
    // iOS hands back the peripherals it kept alive for us while suspended.
    let restored = dict[CBCentralManagerRestoredStatePeripheralsKey]
        as? [CBPeripheral] ?? []
    for peripheral in restored {
        peripheral.delegate = self          // delegates don't survive relaunch
        self.peripheral = peripheral
        // Already connected? Resume mid-chain — re-discover, don't re-scan.
        if peripheral.state == .connected {
            peripheral.discoverServices([serviceUUID])
        }
    }
}

The detail that bites people: the delegate references are not restored — only the peripheral objects are. If you forget to set peripheral.delegate = self inside willRestoreState, the connection is alive but every callback vanishes into the void, and you'll swear the device "stopped responding" when really you stopped listening. Reassign delegates first, before anything else.

Reconnection is not optional

The link will drop. The user walks out of range, the device's coin cell sags, someone microwaves lunch on the 2.4 GHz band — and you get didDisconnectPeripheral, often with an error, often without warning. The wrong response is to surface "Disconnected" and stop. The right response, for hardware the user clearly intends to stay paired with, is to quietly try to get it back.

Core Bluetooth gives you a lovely primitive for this: call connect(peripheral) on a peripheral that is out of range and it doesn't fail — it stays pending indefinitely and fires didConnect the moment the device reappears. So for an unexpected drop, the baseline is simply to re-issue the connect and let iOS watch for the device. But when the failure is something else — the device is powering down, rejecting you, mid-firmware-update — hammering connect in a tight loop wastes the radio. Wrap it in exponential backoff: retry quickly at first, then back off, with a cap and a little jitter so a fleet of phones doesn't reconnect in lockstep.

func centralManager(_ central: CBCentralManager,
                    didDisconnectPeripheral peripheral: CBPeripheral,
                    error: Error?) {
    state = .disconnected
    guard shouldStayConnected else { return }   // user chose to disconnect → stop

    let delay = min(pow(2.0, Double(retryCount)), 30) // 1, 2, 4, 8 … capped 30s
        + Double.random(in: 0...0.5)                  // jitter to avoid a thundering herd
    retryCount += 1

    bleQueue.asyncAfter(deadline: .now() + delay) { [weak self] in
        guard let self, self.shouldStayConnected else { return }
        // A pending connect resolves as soon as the device is back in range.
        self.manager.connect(peripheral, options: nil)
    }
}

func centralManager(_ central: CBCentralManager, didConnect peripheral: CBPeripheral) {
    retryCount = 0                               // reset the backoff on success
    state = .discoveringServices
    peripheral.discoverServices([serviceUUID])
}

Two things make this robust in the field. Keep a shouldStayConnected intent flag so you never fight a deliberate disconnect — if the user tapped "disconnect," you stop retrying immediately. And reset retryCount to zero on every successful connect, so a device that flaps once doesn't inherit a 30-second penalty for the rest of the session.

Respect flow control on writes

If your app sends data to the device — streaming a config blob, pushing a firmware chunk — you'll reach for writeWithoutResponse because it's fast: no ACK per packet. But "without response" does not mean "without limits." There's a transmit queue underneath, and if you fire writes in a for loop faster than the radio can drain it, packets are silently dropped. No error. No callback. The device just receives garbage, and you spend an afternoon blaming the firmware.

The fix is to honour the framework's own backpressure signal. Before each write, check peripheral.canSendWriteWithoutResponse. When it returns false, stop, and wait for the peripheralIsReady(toSendWriteWithoutResponse:) delegate callback before resuming. It's a simple producer/consumer handshake, and it's the difference between a transfer that completes and one that corrupts intermittently on exactly the devices you can't reproduce.

private var pending = ArraySlice<Data>()

func sendChunks(_ chunks: [Data], to peripheral: CBPeripheral,
                _ characteristic: CBCharacteristic) {
    pending = chunks[...]
    drain(peripheral, characteristic)
}

private func drain(_ peripheral: CBPeripheral, _ characteristic: CBCharacteristic) {
    while peripheral.canSendWriteWithoutResponse, let chunk = pending.first {
        peripheral.writeValue(chunk, for: characteristic, type: .withoutResponse)
        pending = pending.dropFirst()
    }
    // Ran dry on the radio, not on data → wait for the ready callback.
}

func peripheralIsReady(toSendWriteWithoutResponse peripheral: CBPeripheral) {
    guard let characteristic = writeCharacteristic else { return }
    drain(peripheral, characteristic)            // resume exactly where we paused
}

One more knob worth knowing: maximumWriteValueLength(for: .withoutResponse) tells you the largest payload that fits in a single packet given the negotiated MTU. Chunk to that size. Hand writeValue something larger and behaviour gets implementation-defined fast.

Design the UI for a link that fails

All of the above is plumbing. The part users actually feel is how the interface behaves while the radio misbehaves — and this is where I see otherwise solid apps fall down. A few principles I hold to:

Be optimistic, then reconcile. When the user toggles a switch on the device, reflect it in the UI immediately and send the write in the background. If the write fails or times out, roll the UI back and tell them. A control that freezes for two seconds on every tap feels broken even when it's working.
Every command needs a timeout. Core Bluetooth will let a connect or a read hang forever. Wrap each operation in a timeout you control — a few seconds — and surface a retry rather than an eternal spinner.
Make "disconnected" a real, legible state. Not a generic error alert. A persistent, calm banner — "Reconnecting…" — that disappears on its own when the backoff loop succeeds. Users tolerate a flaky link far better when the app is visibly, honestly handling it.
Make retries idempotent. Since you're reconnecting and replaying automatically, a command must be safe to send twice. Design the protocol so a duplicated "unlock" or "set brightness to 40" does no harm.

The throughline is that the user should never have to think about Bluetooth. They think about the lock, the sensor, the light. The connection is your problem, and the best Bluetooth UX is one where the seams — the drops, the reconnects, the retries — are invisible because you handled them before they reached the screen.

Build for the radio to fail

If there's one mindset shift that separates BLE code that survives the App Store from code that floods your support inbox, it's this: stop treating disconnection as the exception and start treating it as the steady state you happen to recover from. Wait for .poweredOn. Walk the discovery chain one async hop at a time. Opt into state restoration so a pocketed phone keeps its connection. Reconnect with backoff and a clear intent flag. Honour flow control on writes. And wrap the whole thing in a UI that stays calm and honest when the link goes quiet.

None of it is exotic — it's a dozen small, defensive habits, each easy to skip on a desk where the device is a foot away and the battery is full. The radio doesn't care about your desk. It will drop the connection at the worst possible moment, on a device you can't reproduce, in a building you've never been in. Write the code that assumes it already has, and the failures stop being incidents and become just another Tuesday.