Software upgrades should be simple and require minimal disruption, but all too often upgrading enterprise software is extremely complicated and requires extensive downtime.
For tools like Log Insight that are used to monitor critical applications and services, downtime during upgrade could mean missing alerts about the health of the monitored systems, losing logs and metrics that should be archived, etc.
Log Insight is designed from the ground up to protect critical functionality during upgrade. Certain advanced management operations (such as registering new users) may fail. But critical features — specifically, the ingestion of incoming events, and analytic queries from the user interface —- operate seamlessly.
The following data demonstrate seamless event ingestion in the Log Insight upgrade of six-node cluster.
The first chart shows the total number of events ingested over time into the cluster during the upgrade — you probably can’t tell when the upgrade happened.
The second chart shows the number of events ingested into each node. During the rolling upgrade, ingestion on each node in turn drops to zero as it is upgraded, but the load is automatically balanced to other nodes so as shown above the total ingestion rate is not affected.
(Incidentally, these charts illustrate extreme dogfooding by the Log Insight engineering team: all logs from every Log Insight cluster running inside VMware are ingested into a Log Insight cluster which is actively used by the Log Insight team to validate functionality, monitor performance, diagnose bugs, etc.)