Technical guide · Asset reliability

Condition-Based & Risk-Based Maintenance with Thermal Data

A practitioner's guide to turning continuous temperature measurements on switchgear, busbars and lines into defensible CBM and RBM decisions — the physics, the data interpretation, the thresholds and the integration.

By VTI Corp engineering · ~14 min read · For grid operations & design engineers, protection specialists and asset owners

Most unplanned outages on distribution and transmission plant do not begin as insulation failures — they begin as a connection that runs a few degrees too hot, for months, until it doesn't. This guide explains why temperature is the single most actionable leading indicator of connection health, and how to convert it into condition-based maintenance (CBM) and risk-based maintenance (RBM) programs that stand up to scrutiny.

1. Why maintenance strategy is shifting

Maintenance philosophies sit on a spectrum of increasing sophistication and decreasing total cost of ownership:

For bolted and clamped current-carrying connections — busbar joints, disconnector contacts, cable lugs, line splices — the dominant failure mode is progressive increase in contact resistance, and its earliest external symptom is heat. That makes temperature the natural condition signal on which to build CBM and RBM.

2. The physics: why temperature leads failure

A current-carrying joint dissipates power as heat according to:

P = I² · R_joint → joint temperature rise above the conductor ∝ I² · R_joint

When a joint is sound, R_joint is a small fraction of the adjacent conductor's resistance and the connection runs at or below conductor temperature. As the joint degrades — bolt relaxation, fretting, oxidation of aluminium, intermetallic growth in Al–Cu joints — R_joint rises. Because dissipation scales with R, even a modest resistance increase produces a disproportionate local temperature rise.

Worse, the process is self-reinforcing. Higher temperature accelerates oxidation and metal creep, which raises resistance, which raises temperature — a positive-feedback loop that ends in thermal runaway, annealing, arcing and, in enclosed switchgear, potential arc-flash. The degradation rate of the underlying chemistry roughly follows an Arrhenius relationship (reaction rate increasing exponentially with absolute temperature), which is why a connection can sit "slightly warm" for a long time and then fail quickly once it crosses a threshold.

Key point. Temperature is a leading indicator because it appears at the resistance-growth stage — long before insulation damage, gas evolution or mechanical failure. The earlier you see it, the longer your decision window.

3. Reading thermal data correctly

Raw temperature in °C is necessary but not sufficient. Four interpretation principles separate a useful program from a nuisance-alarm generator.

3.1 Temperature rise (ΔT), not just absolute temperature

Standards specify limits as a temperature rise above ambient, because a 75 °C joint means very different things at 10 °C versus 45 °C ambient. IEC 62271-1 and IEEE C37.20.x set rise limits for connections by material and coating; as a working reference (always confirm against the current edition for your equipment):

Connection typeTypical max temperatureTypical rise limit (40 °C ambient)
Bare copper / aluminium, bolted~90 °C~50 K
Silver- or nickel-coated, bolted~105–115 °C~65–75 K

Values are indicative of common standard tables and vary by edition, coating and assembly; use them to frame thresholds, not as acceptance criteria.

3.2 Normalise to load — ΔT scales with the square of current

Because rise ∝ I², a joint that looks benign at 40% load can be alarming at full load. Compare measurements at comparable load, or normalise to a reference current:

ΔT_ref = ΔT_measured · ( I_ref / I_measured )²

A health rule that ignores load will either miss developing faults at light load or cry wolf at peak. Continuous monitoring has a decisive advantage here: it captures ΔT across the full load cycle, so you see the joint at the load that matters.

3.3 Phase-to-phase comparison and rate-of-rise

The two most robust practical signals require no perfect absolute calibration:

Practitioner's tip. Build alarms on (a) load-normalised ΔT versus an absolute limit, (b) phase divergence, and (c) rate-of-rise. Any one alone generates either misses or false alarms; the three together are reliable.

3.4 The P–F interval: why continuous beats periodic

In reliability terms, the P–F interval is the time between the point a failure becomes detectable (P) and functional failure (F). Periodic IR thermography samples this curve a few times a year; if the P–F interval for a thermal defect is weeks, an annual scan can easily miss it entirely. Continuous, fixed-point monitoring effectively shortens detection latency to near-zero and lets you act anywhere along the P–F curve — the core economic argument for permanent sensors over walk-around surveys.

4. From data to a CBM program

A workable CBM loop for thermal data has five elements:

  1. Baseline: on commissioning, record load-normalised ΔT for every monitored point. This is the "healthy" fingerprint.
  2. Two-tier thresholds: an alert level (investigate / increase sampling) and an alarm level (plan intervention). Derive both from the standard rise limit minus a safety margin, then refine with the baseline.
  3. Trending: persist time-series so rate-of-rise and phase divergence can be computed, not just instantaneous values.
  4. Trigger: on alarm or adverse trend, generate a work order into the CMMS with the point ID, history and load context.
  5. Close the loop: after intervention, confirm ΔT returns to baseline — proof the fix worked, and a new reference.

The discipline that fails most CBM programs is not sensing — it is step 4–5: data that never becomes a work order, or interventions that are never verified. Specify the integration before you specify the sensor.

5. From CBM to RBM: prioritising by risk

CBM tells you which assets are degrading. On a real network, dozens may be flagged at once and crews are finite. RBM ranks them by risk:

Risk = Probability of Failure (PoF) × Consequence of Failure (CoF)

Thermal data is a strong, quantified input to PoF: severity of ΔT exceedance, steepness of the rising trend and degree of phase imbalance map naturally onto a probability band. CoF comes from the asset's role — feeder criticality, customers affected, N-1 redundancy, safety exposure (e.g. enclosed switchgear arc-flash risk), and revenue or penalty at stake.

PoF ↓ / CoF →Low consequenceMediumHigh / safety
High (steep rising trend, phase hot)MediumHighCritical — act now
Medium (above alert, stable)LowMediumHigh
Low (at baseline)MonitorMonitorMedium (watch closely)

This is the bridge to formal asset management under ISO 55000/55001: maintenance and capital decisions become traceable to measured condition and quantified risk, not to the calendar or to whoever shouts loudest — exactly the defensibility regulators and boards increasingly demand.

6. Where thermal monitoring sits in the data architecture

For the program to function at fleet scale, temperature has to travel from the energised joint to a work order without manual transcription:

Sensor → Gateway (concentrator) → Historian/Server → Analytics (ΔT, trend, risk) → SCADA/DMS & CMMS

A sensor that cannot deliver into IEC 61850 / SCADA and the maintenance workflow is a data island — useful for a one-off investigation, not for a CBM/RBM program.

Deep dive: Substation temperature monitoring & IEC 61850 integration →

7. Sensing options — an honest comparison

No single technology wins everywhere. Match the method to the asset:

MethodStrengthsLimitationsBest fit
IR thermography (handheld/walk-around)Flexible, no install, whole-scene viewPeriodic only (misses short P–F intervals); needs line of sight & access; operator-dependentSurveys, commissioning, spot checks
Fixed IR camerasContinuous, area coverageLine-of-sight only; cannot see inside enclosures/behind barriers; cost per viewOpen substations, accessible busbars
Fiber-optic DTSContinuous profile along the whole cable; intrinsically EMI-immuneSuited to cables/lines, not discrete bolted joints; controller costPower cables, long runs, dynamic rating
Wireless point sensors (incl. self-powered)Direct contact measurement inside enclosures; continuous; per-joint resolution; live-line installOne device per point; wireless link must reject HV/UHV EMI; power source mattersBusbar joints, breaker contacts, terminations, line connectors

For the discrete current-carrying connections that dominate connection failures — and that are often inside switchgear where cameras cannot see — direct-contact wireless point sensors are usually the right tool. The practical objections to them are batteries (replacement across thousands of points) and EMI. Energy-harvesting, battery-free designs remove the first; purpose-built RF and shielding address the second.

Deep dive: Busbar hot-spot monitoring in switchgear →

8. The transmission-line case: thermal data and dynamic line rating

On overhead lines the same data serves a second purpose. Conductor ampacity is governed by a heat balance (per IEEE 738 / CIGRE thermal models):

I²R + q_solar = q_convection + q_radiation

Static ratings assume conservative worst-case weather, leaving real headroom unused most of the time. Measuring connector and conductor temperature directly lets operators apply dynamic line rating (DLR) — safely carrying more current when conditions allow — while the same sensors flag the splice and dead-end hot spots that static ratings never see. Reliability and capacity from one dataset.

Deep dive: Dynamic Line Rating (DLR) explained →

9. Implementation pitfalls (read before you buy)

Put objective thermal data behind your CBM/RBM program

VTI's self-powered, EMI-immune wireless sensors and gateway feed switchgear, substation and line temperatures straight into your analytics and CMMS.

Request the technical datasheet

Frequently asked questions

Is temperature a leading or lagging indicator of failure?

Leading. It appears at the resistance-growth stage of a connection — well before insulation damage or mechanical failure — which is why it gives the longest actionable decision window of the common condition signals.

What temperature should trigger action on a busbar joint?

Frame thresholds from the standard rise limit for the joint's material/coating (IEC 62271-1, IEEE C37.20.x), expressed as ΔT above ambient and normalised to load, then refine against the commissioning baseline. Use a two-tier alert/alarm scheme plus rate-of-rise and phase-imbalance rules rather than a single absolute number.

How is RBM different from CBM here?

CBM decides whether an asset needs attention from its measured condition. RBM ranks those needs by risk = probability of failure × consequence, so limited crews and capex are directed where they reduce the most business and safety risk — aligning with ISO 55000.

Why not just use periodic IR thermography?

Periodic scanning samples the P–F curve only a few times a year and cannot see inside enclosures. If a thermal defect's P–F interval is weeks, an annual survey can miss it. Continuous fixed sensors remove that detection latency.

This guide is provided for engineering education. Standard values (IEC 62271-1, IEEE C37.20.x, IEEE 738, ISO 55000) are summarised for orientation and vary by edition and equipment; always design thresholds and ratings against the current applicable standard and manufacturer data.