Most unplanned outages on distribution and transmission plant do not begin as insulation failures — they begin as a connection that runs a few degrees too hot, for months, until it doesn't. This guide explains why temperature is the single most actionable leading indicator of connection health, and how to convert it into condition-based maintenance (CBM) and risk-based maintenance (RBM) programs that stand up to scrutiny.
1. Why maintenance strategy is shifting
Maintenance philosophies sit on a spectrum of increasing sophistication and decreasing total cost of ownership:
- Reactive (run-to-failure): cheapest per intervention, ruinous per failure. Acceptable only for non-critical, redundant assets.
- Time-based (preventive): service every N months regardless of condition. It simultaneously over-maintains healthy assets (cost, intervention-induced faults) and under-protects assets degrading faster than the calendar assumes.
- Condition-based (CBM): act on the measured condition of the specific asset. Requires a reliable condition signal.
- Risk-based (RBM): prioritise CBM findings by business risk — probability of failure weighted by consequence — so finite crews and capex go where they buy the most reliability.
- Predictive / prescriptive: model remaining useful life and recommend the optimal action and timing.
For bolted and clamped current-carrying connections — busbar joints, disconnector contacts, cable lugs, line splices — the dominant failure mode is progressive increase in contact resistance, and its earliest external symptom is heat. That makes temperature the natural condition signal on which to build CBM and RBM.
2. The physics: why temperature leads failure
A current-carrying joint dissipates power as heat according to:
When a joint is sound, R_joint is a small fraction of the adjacent conductor's resistance and the connection runs at or below conductor temperature. As the joint degrades — bolt relaxation, fretting, oxidation of aluminium, intermetallic growth in Al–Cu joints — R_joint rises. Because dissipation scales with R, even a modest resistance increase produces a disproportionate local temperature rise.
Worse, the process is self-reinforcing. Higher temperature accelerates oxidation and metal creep, which raises resistance, which raises temperature — a positive-feedback loop that ends in thermal runaway, annealing, arcing and, in enclosed switchgear, potential arc-flash. The degradation rate of the underlying chemistry roughly follows an Arrhenius relationship (reaction rate increasing exponentially with absolute temperature), which is why a connection can sit "slightly warm" for a long time and then fail quickly once it crosses a threshold.
3. Reading thermal data correctly
Raw temperature in °C is necessary but not sufficient. Four interpretation principles separate a useful program from a nuisance-alarm generator.
3.1 Temperature rise (ΔT), not just absolute temperature
Standards specify limits as a temperature rise above ambient, because a 75 °C joint means very different things at 10 °C versus 45 °C ambient. IEC 62271-1 and IEEE C37.20.x set rise limits for connections by material and coating; as a working reference (always confirm against the current edition for your equipment):
| Connection type | Typical max temperature | Typical rise limit (40 °C ambient) |
|---|---|---|
| Bare copper / aluminium, bolted | ~90 °C | ~50 K |
| Silver- or nickel-coated, bolted | ~105–115 °C | ~65–75 K |
Values are indicative of common standard tables and vary by edition, coating and assembly; use them to frame thresholds, not as acceptance criteria.
3.2 Normalise to load — ΔT scales with the square of current
Because rise ∝ I², a joint that looks benign at 40% load can be alarming at full load. Compare measurements at comparable load, or normalise to a reference current:
A health rule that ignores load will either miss developing faults at light load or cry wolf at peak. Continuous monitoring has a decisive advantage here: it captures ΔT across the full load cycle, so you see the joint at the load that matters.
3.3 Phase-to-phase comparison and rate-of-rise
The two most robust practical signals require no perfect absolute calibration:
- Phase imbalance: on a balanced three-phase circuit the three homologous joints should track within a few kelvin. A single phase running consistently hotter is a near-certain connection problem, independent of ambient or absolute accuracy.
- Rate-of-rise / trend break: a joint whose load-normalised ΔT is climbing week-over-week is degrading, even while still under any absolute limit. Trend is often more actionable than threshold.
3.4 The P–F interval: why continuous beats periodic
In reliability terms, the P–F interval is the time between the point a failure becomes detectable (P) and functional failure (F). Periodic IR thermography samples this curve a few times a year; if the P–F interval for a thermal defect is weeks, an annual scan can easily miss it entirely. Continuous, fixed-point monitoring effectively shortens detection latency to near-zero and lets you act anywhere along the P–F curve — the core economic argument for permanent sensors over walk-around surveys.
4. From data to a CBM program
A workable CBM loop for thermal data has five elements:
- Baseline: on commissioning, record load-normalised ΔT for every monitored point. This is the "healthy" fingerprint.
- Two-tier thresholds: an alert level (investigate / increase sampling) and an alarm level (plan intervention). Derive both from the standard rise limit minus a safety margin, then refine with the baseline.
- Trending: persist time-series so rate-of-rise and phase divergence can be computed, not just instantaneous values.
- Trigger: on alarm or adverse trend, generate a work order into the CMMS with the point ID, history and load context.
- Close the loop: after intervention, confirm ΔT returns to baseline — proof the fix worked, and a new reference.
The discipline that fails most CBM programs is not sensing — it is step 4–5: data that never becomes a work order, or interventions that are never verified. Specify the integration before you specify the sensor.
5. From CBM to RBM: prioritising by risk
CBM tells you which assets are degrading. On a real network, dozens may be flagged at once and crews are finite. RBM ranks them by risk:
Thermal data is a strong, quantified input to PoF: severity of ΔT exceedance, steepness of the rising trend and degree of phase imbalance map naturally onto a probability band. CoF comes from the asset's role — feeder criticality, customers affected, N-1 redundancy, safety exposure (e.g. enclosed switchgear arc-flash risk), and revenue or penalty at stake.
| PoF ↓ / CoF → | Low consequence | Medium | High / safety |
|---|---|---|---|
| High (steep rising trend, phase hot) | Medium | High | Critical — act now |
| Medium (above alert, stable) | Low | Medium | High |
| Low (at baseline) | Monitor | Monitor | Medium (watch closely) |
This is the bridge to formal asset management under ISO 55000/55001: maintenance and capital decisions become traceable to measured condition and quantified risk, not to the calendar or to whoever shouts loudest — exactly the defensibility regulators and boards increasingly demand.
6. Where thermal monitoring sits in the data architecture
For the program to function at fleet scale, temperature has to travel from the energised joint to a work order without manual transcription:
- Sensing layer: fixed sensors on the actual contact points, sampling continuously.
- Aggregation: a gateway concentrates many points (a VTI gateway handles up to 1,000) and forwards over the utility network.
- Integration: for digital substations, IEC 61850 is the lingua franca; alarms should reach SCADA/DMS, and findings should open tickets in the CMMS / asset-management system.
- Analytics: load-normalisation, trending and risk scoring belong here, close to the historian.
A sensor that cannot deliver into IEC 61850 / SCADA and the maintenance workflow is a data island — useful for a one-off investigation, not for a CBM/RBM program.
Deep dive: Substation temperature monitoring & IEC 61850 integration →
7. Sensing options — an honest comparison
No single technology wins everywhere. Match the method to the asset:
| Method | Strengths | Limitations | Best fit |
|---|---|---|---|
| IR thermography (handheld/walk-around) | Flexible, no install, whole-scene view | Periodic only (misses short P–F intervals); needs line of sight & access; operator-dependent | Surveys, commissioning, spot checks |
| Fixed IR cameras | Continuous, area coverage | Line-of-sight only; cannot see inside enclosures/behind barriers; cost per view | Open substations, accessible busbars |
| Fiber-optic DTS | Continuous profile along the whole cable; intrinsically EMI-immune | Suited to cables/lines, not discrete bolted joints; controller cost | Power cables, long runs, dynamic rating |
| Wireless point sensors (incl. self-powered) | Direct contact measurement inside enclosures; continuous; per-joint resolution; live-line install | One device per point; wireless link must reject HV/UHV EMI; power source matters | Busbar joints, breaker contacts, terminations, line connectors |
For the discrete current-carrying connections that dominate connection failures — and that are often inside switchgear where cameras cannot see — direct-contact wireless point sensors are usually the right tool. The practical objections to them are batteries (replacement across thousands of points) and EMI. Energy-harvesting, battery-free designs remove the first; purpose-built RF and shielding address the second.
Deep dive: Busbar hot-spot monitoring in switchgear →
8. The transmission-line case: thermal data and dynamic line rating
On overhead lines the same data serves a second purpose. Conductor ampacity is governed by a heat balance (per IEEE 738 / CIGRE thermal models):
Static ratings assume conservative worst-case weather, leaving real headroom unused most of the time. Measuring connector and conductor temperature directly lets operators apply dynamic line rating (DLR) — safely carrying more current when conditions allow — while the same sensors flag the splice and dead-end hot spots that static ratings never see. Reliability and capacity from one dataset.
Deep dive: Dynamic Line Rating (DLR) explained →
9. Implementation pitfalls (read before you buy)
- Sensor placement: measure the joint, not the bus 200 mm away. Thermal gradients are steep; placement determines whether you see the defect.
- Calibration & drift: demand a stated accuracy and a calibration/verification path. ±1 °C on paper is meaningless without traceability.
- Alarm hygiene: raw-°C thresholds without load-normalisation breed false alarms; operators then ignore the system. Tune to ΔT, phase and trend.
- Data without action: if there is no owner for the work order, monitoring is theatre. Assign accountability before deployment.
- EMI validation: for HV/UHV, ask for evidence the wireless link survives the field — ideally third-party or documented site results, not a datasheet adjective.
Put objective thermal data behind your CBM/RBM program
VTI's self-powered, EMI-immune wireless sensors and gateway feed switchgear, substation and line temperatures straight into your analytics and CMMS.
Request the technical datasheetFrequently asked questions
Is temperature a leading or lagging indicator of failure?
Leading. It appears at the resistance-growth stage of a connection — well before insulation damage or mechanical failure — which is why it gives the longest actionable decision window of the common condition signals.
What temperature should trigger action on a busbar joint?
Frame thresholds from the standard rise limit for the joint's material/coating (IEC 62271-1, IEEE C37.20.x), expressed as ΔT above ambient and normalised to load, then refine against the commissioning baseline. Use a two-tier alert/alarm scheme plus rate-of-rise and phase-imbalance rules rather than a single absolute number.
How is RBM different from CBM here?
CBM decides whether an asset needs attention from its measured condition. RBM ranks those needs by risk = probability of failure × consequence, so limited crews and capex are directed where they reduce the most business and safety risk — aligning with ISO 55000.
Why not just use periodic IR thermography?
Periodic scanning samples the P–F curve only a few times a year and cannot see inside enclosures. If a thermal defect's P–F interval is weeks, an annual survey can miss it. Continuous fixed sensors remove that detection latency.
