What 2 Billion Sensor Readings Taught Us About Data Quality

We crossed two billion processed sensor readings sometime in early 2025. We didn't make a big announcement about it — it's a number, not an achievement — but it did give us a reason to look back at what the data actually looks like across 340+ facilities and more than a dozen industries.

The headline finding is not surprising but bears repeating: data quality problems are universal, and most facilities have no systematic way of detecting them. They manifest as wrong readings, missed readings, drift, stuck values, and scaling errors. They erode confidence in monitoring systems. They generate false alerts and missed faults. And they're almost always solvable once you know what to look for.

The Stuck Value Problem

The most common data quality defect we see across all sensor types is stuck values: a sensor that transmits the same reading for an extended period, not because the measured parameter is actually constant, but because the sensor has failed partially. The transmitter electronics are still functioning — it's sending packets, the packets arrive — but the measurement element (thermocouple, pressure membrane, MEMS gyroscope) has failed or disconnected.

Across the 2 billion readings, stuck value events occur in approximately 1.3% of all sensor-months. That means in a 500-sensor deployment, you expect 6-7 sensors per month to have some period of stuck values. Most operations teams don't notice these unless the stuck value happens to be at a suspicious round number. A temperature sensor stuck at 72.4°F for two weeks looks like a stable environment. It might be — or the sensor failed on a cool Tuesday morning and has been reporting that value ever since.

Detecting stuck values requires a check that most alert systems don't implement: flag any sensor where the coefficient of variation over a rolling window drops below a minimum threshold. Real process variables fluctuate. Temperature in a machine room varies with occupancy, HVAC cycles, and external weather. Humidity varies with ventilation and infiltration. A standard deviation of exactly zero over 4+ hours is a signal that needs investigation.

Sensor Drift

Electrochemical sensors — gas detectors for CO, CO2, H2S, O2 — have a known limitation: they drift. The electrochemical cell degrades over time, and the zero point and span of the measurement shift slowly. Calibration restores accuracy, but calibration intervals in practice are often set by policy (annually, or at sensor replacement) rather than by drift monitoring.

In our data, electrochemical gas sensors show measurable zero drift in approximately 23% of sensor-years. The drift is typically slow — 1-3% of full scale per month — and therefore not obvious from individual readings. It becomes visible only when you compare the sensor's baseline in normal conditions over time. A CO detector that reads 0.2 ppm in normal conditions in January and 4.1 ppm in normal conditions in September is drifting. Both readings might be below the alert threshold, so no alert fires. But the sensor is no longer reliable.

Pressure sensors, particularly capacitive MEMS types, show similar long-term drift characteristics, though less severe — typically under 0.5% of full scale per year. For critical process measurements, annual calibration verification is not excessive. For monitoring applications where relative changes matter more than absolute accuracy, the drift tolerance is higher.

Wiring and Termination Failures

In brownfield deployments with older instrumentation wiring, intermittent electrical connections are a significant source of data quality problems. The symptom is noise: a pressure reading that's normally stable at 87.3 PSI suddenly shows high-frequency variation of ±5 PSI for a few hours, then returns to normal. The cause is a corroded terminal block or a loose screw connection that intermittently adds resistance to the 4-20mA loop, disturbing the signal.

These events are easy to mistake for process variation if you're not looking for them specifically. The distinguishing feature is the frequency and amplitude of the noise: genuine process pressure variation of ±5 PSI over 30 seconds would have mechanical causes that would be visible in the process. Electrical noise from a bad connection tends to be higher frequency and uncorrelated with process events.

Across our deployment base, wiring-related data quality events are most common in facilities with instrumentation installed before 2005 and show a strong seasonal pattern in climates with high humidity variation — the corrosion rate on exposed terminal blocks accelerates significantly in summer months in coastal locations.

Configuration Errors That Survive for Months

The data quality issue that's hardest to catch is systematic offset from a configuration error. A flow meter with the wrong scale factor in the integration configuration will report readings that are off by a fixed multiplier — consistently wrong, in a way that doesn't trigger anomaly detection because the values look plausible.

We've seen: a vibration sensor configured with units of mm/s when the device outputs in/s, producing readings 25.4× too low (and therefore never triggering vibration alerts). A temperature sensor with the Celsius/Fahrenheit conversion applied twice, showing outdoor air temperature of 156°F during a 42°C heatwave. A flow meter with a zero offset from a misconfigured live zero (the 4mA dead-band wasn't configured, so the reading went negative at low flow rates).

Configuration errors like these are invisible to threshold alert systems because the thresholds are set based on the erroneous data. The only way to catch them systematically is cross-validation: comparing sensor readings against physics-based expectations (a pump in a sealed system can't have negative flow), against redundant sensors measuring the same parameter, or against historical baselines from before the configuration was changed.

What Good Data Quality Practice Looks Like

Three practices that address the majority of data quality failures: automated stuck-value detection on all sensors, with a configurable rolling window; sensor health scoring based on transmission rate, variance statistics, and range validation; and systematic cross-validation for sensors that measure parameters constrained by physics (conservation of mass, heat balance, pressure drop models).

These are not exotic techniques. They're standard statistical quality control applied to sensor data. The barrier to implementing them is usually that the monitoring platform doesn't support them natively, so they require custom analytics work that never gets prioritized. When data quality monitoring is part of the platform, it runs automatically and the alerts come to the same operations team on the same notification channels as process alerts.

The output isn't just cleaner data. It's a monitoring program that the operations team actually trusts — because when it says something is wrong, experience has shown that something actually is wrong. That trust is what determines whether a sensor network generates real operational value or becomes another system that nobody looks at.

Concerned about data quality in your sensor deployment?

SensorVault includes automated stuck-value detection, sensor health scoring, and range validation for every sensor on the platform. Data quality is monitored, not assumed.

See the Platform