Log Insight Log Analytics

vRealize Log Insight Index Partitions and Variable Retention Deep Dive

This blog was co-authored by Yogita Patil

vRealize Log Insight is VMware’s log analytics solution for private cloud environments and we recently released version 8.1. One of the new features we introduced in 8.1 is index partitioning which allows you to set granular retention periods for your various logs. In this blog, we’re going to explain why index partitions exist, and how to implement them successfully.

Background

Log data comes in many types, shapes, and sizes. Different types of data may require different retention periods. For example, it would be nice to have some sensitive information, which has the likelihood of getting compromised in one way or another, stored separately for a shorter time period. On the other hand, vital data, such as audit logs for customer logins/logouts, could require a longer stay, for several days or months, for convenience and as per audit requirements from the security and legal team. Yet other kinds of data could require an even longer storage period for historical metrics and trend analysis.

Variable data retention allows you to set different retention periods for different kinds of data by creating relevant and separate index partitions with different retention periods for them.

What are the Use Cases that log partitioning solves?

(Credit goes to Martin Gazharyan from engineering for outlining these.)

  • My log data is heterogeneous, and I want to be able to have different retention and archiving policies for them. I’d like to specify different retention periods for different types of log data.
  • I want to achieve faster query speeds and lower storage requirements by eliminating unnecessary logs.
  • I have compliance policy requirements that insist logs of a certain type must be retained and searchable for a specific period.

Log Insight Storage Basics

To understand how the index partition and variable retention feature works let’s first take a look at how vRealize Log Insight stores logs.

Log Insight ingests log data via Syslog or its own ingestion API known as CFAPI. When a new log message is received, it gets parsed and stored in a bucket.

Once a bucket reaches 500 MB in size, it gets marked as read-only (sealed) and a brand-new bucket is created for new messages. If archiving is enabled, then a copy of the full bucket gets written to the NFS share. This process repeats until Log Insight runs out of storage.

At that point, the oldest bucket is aged out (deleted) to make room for a new bucket. If archiving is enabled, then the archive copy will remain untouched on the external NFS share.

Log Index Partitions

The variable data log retention feature is based on partitions and you can configure multiple partitions. Each partition is responsible for collecting and storing the logs for its retention policy. You can apply filters to each partition to define what logs to collect versus reject. As logs get continuously ingested in vRealize Log Insight, they are checked to see if they meet the criteria of each defined partition. If the filter criteria are met, it will get stored in that partition and will be aged out based on the retention period for that partition. If a log message does not match the criteria of any of the user-created partitions, it gets placed into the default partition. All logs stored in any partition are searchable within vRealize Log Insight until they are aged out and deleted. This feature helps you reduce log noise, speed up log searches, and save storage for the logs that matter most to your deployment. Let’s take a closer look at how this works.

The default partition is created at the time of deployment and is where all logs are stored if no custom partitions are configured.

As you can see in the diagram above, the default partition (as well as any other partitions that get created) operate on the same principle as buckets. New logs are stored in the active bucket and once that bucket reaches 500MB, it gets archived, if archiving is enabled, and then sealed. Once a bucket is sealed, it can be deleted after the newest log in the bucket becomes older than the retention period set for that partition.

Now, let’s create a new partition. We will call this the orange partition which is responsible for collecting all of our orange logs. Logs meeting the criteria of the orange partition will get collected by the orange partition, while all other logs will continue to the default partition and get stored there. The same storage retention mechanism applies to this new partition where buckets get filled, then archived and sealed when they reach 500 MB. Once the newest log in that sealed bucket is older than the retention period set on the orange partition, it can be deleted.

Let’s create one more partition and call it the green partition. As logs get ingested by Log Insight, they will be tested against the green partition first. Partitions are sorted alphabetically by name so even though green was created after orange it still is the first partition to be checked. If any green logs come in it will get stored in the green partition. If a new log is orange, it will be tested against the green partition’s filters before moving on to the orange partition to be stored. Any other logs will flow through each partition’s filters before finally being stored in the default partition. But older logs already bucketed in the orange partition will not be re-checked and moved to the green partition even if they match. Logs Ingested from the point in time when the green partition is created will be checked against all three partitions.

So, what have we learned so far? We have learned that we can have multiple partitions and each partition works based on buckets. Once a bucket reaches 500 MB in size it can be archived (if archiving is enabled) and then sealed. Only sealed buckets can be deleted once the newest log in that bucket is older than the retention period set for the partition. Partitions are handled alphabetically (except for the default partition) so that new logs are first checked against the filtering criteria of “partition A” before being checked against “partition B” as so on until it reaches the default partition.

Best practices and other useful information

It is important to plan out how you want to partition your logs ahead of time. What is your purpose for creating partitions? Is it for compliance? Is it for improved storage utilization and retention of more important logs? What are the filtering criteria that you are going to build? How long are you going to retain those logs?

vRealize Log Insight 8.1 can handle up to five partitions (including the default partition) and creating a partition currently requires a reboot of the vRLI cluster. It is also possible to create intersecting criteria for two or more partitions. For example, let’s say I have built one partition for my critical logs and one partition for my host logs and named them as such. My logs from esxi01.abc.com will land in the host partition. However, if I receive a critical log from esxi01.abc.com it will get stored in the critical partition. Again, this is because partitions are handled alphabetically and since “critical” comes before “host” the log message will get stored in the critical partition. So, it is important to include additional filters for the critical partition to reject logs from esxi01.abc.com.

When working with partitions for variable log retention, it is important to pay attention and carefully plan and test the partitions before deciding on the combination that works for you …

  • Pay attention to the ‘Name’ of the log partition and its alphabetical relation to other partitions.
  • Define filters based on static fields for the exact logs you need in the partition.
  • Define explicit filters to exclude logs you DO NOT want in the partition.
  • Pay attention to filters in other partitions when defining more than one partition so your logs do not end up in the wrong partition.

If you delete a partition before any log data is collected in that partition, then the partition is deleted along with its associated bucket. However, if you delete a partition after its ingested data, then the partition will be removed but its associated buckets will be aged out and deleted based on the retention period for the default partition. This is a safety precaution to prevent the accidental (or malicious) deletion of log data.

Now that we understand how partitions work in vRealize Log Insight, share how you plan to leverage them in the comments below. We would love to see your feedback! For information on other features in the vRealize Log Insight 8.1 release check out our What’s New in vRealize Log Insight 8.1 blog post.