Predictive Maintenance · Supervised Learning · Probability Calibration


Overview

A synthetic yet realistic dataset was generated to simulate operational telemetry from network routers. The goal is to estimate the probability of a router failure occurring within the next 24 hours using real‑time operational variables. This project mimics a predictive maintenance scenario for network infrastructure, where failures are rare and cost‑sensitive.


Business Context

Unplanned router outages lead to service disruptions, SLA penalties, and increased operational workload. By predicting failure probabilities rather than issuing hard binary alarms, network operations teams can prioritize preventive maintenance, allocate resources efficiently, and reduce downtime — all while maintaining control over alert thresholds.


Objective

Train a supervised classification model that outputs a well‑calibrated probability of failure (target = 1) based on five operational features. The model must:


Data Description

A custom Python script created 10,000 records with a fixed random seed for full reproducibility.

Variable Type Description
active_sessions Integer Number of currently active sessions on the router.
crc_errors_per_second Float (0–0.1) CRC errors per second, indicating physical layer degradation.
buffer_memory_utilization_percent Float (0–100) Percentage of buffer memory in use.
unplanned_restarts_last_24h Integer (0–5) Unexpected reboots in the last day.
dropped_packets_due_to_buffer_full_last_hour Integer (0–1000) Packets dropped due to buffer exhaustion.
target Binary (0/1) Failure event in the next 24 hours.

The target was derived from a weighted combination of the features, with added noise, then binarized. Features of positive cases were subsequently inflated to improve separability — a realistic pattern often seen when failures stress the system.