Home > Blogs > VMware vFabric Blog


Using Visual Statistics Display to Analyze GemFire Runtime Configuration, Resources, and Performance

vFabric GemFire is a sophisticated product in a complex problem space: data management in distributed systems. In order to help our users get the most out of GemFire, we are starting a “cookbook” series, which will provide tried and tested recipes that we hope every GemFire user will find useful.

Our first topic is the Visual Statistics Display (VSD). VSD is a visual tool for analyzing GemFire statistics. It reads GemFire statistics from special statistics archive files created by GemFire, and renders their graphs for analysis. It is not a real-time online monitoring tool, such as vFabric Hyperic, so it does not have the real-time monitoring and alerting capabilities that they have. On the other hand, it is the most powerful tool for examining the state of a vFabric GemFire system, as it provides access to all the statistics collected by GemFire. No real-time monitoring tool can do that, as the amount of statistics that GemFire collects is prohibitive for real-time collection in a distributed system.

Having a complete view into the state of a GemFire process is what makes VSD an indispensable forensic tool for performance analysis, and tracking down problems by performing offline analysis of distributed systems using statistics gathered by the cluster. It is also helpful any time we need to verify the runtime state of a distributed system, for example: upon startup or data loading; to make sure that all the nodes are present and see one another, that all the entries are loaded and well balanced across all the nodes; or that JVM heaps have enough headroom; etc.

At the same time, all this “power” comes at a price. The amount of statistics available for viewing in VSD can be overwhelming. In this article, I will point out some of the most important statistics that are useful in verifying the state of a distributed system, including its configuration, resource usage, and throughput for different operations.

With that in mind, let’s take a deeper look at VSD.

Getting Started with VSD

An important prerequisite for VSD is that the collection of GemFire statistics be enabled at runtime. That is accomplished by setting the configuration properties: statistic-sampling-enabled=true, and statistic-archive-file=myStats.gfs. As the collection of statistics at the default sampling rate of 1s does not affect performance, it should always be turned on–during development, testing, and in production.

There is a special category of statistics called time-based statistics that can be very useful in troubleshooting and assessing performance of some GemFire operations, but they should be used with caution because their collection can affect performance. They can be enabled using the property enable-time-statistics=true.

Once a distributed system is up and running, every GemFire instance will have its statistics file created. I usually copy all the stat files into one directory so that I can easily load them into VSD.

Important note: Starting with GemFire version 7.0, VSD is included in the product, and is located in tools directory of the product directory tree. In previous releases, VSD is a separate, free download. For step-by-step instructions on setting up and using VSD, check out the documentation.

Analyzing the Data

Once you have VSD running and statistics archives loaded, it will be populated with lots of interesting data, as shown in the screen shot below.

But what does all this data mean? How do I know what statistics to look at? A large number of statistics are intended only for GemFire Tech Support and Engineering, and finding the ones that are also interesting to the rest of us can be overwhelming. Here is a quick guide to some of the most important categories and statistics they contain:

Runtime Configuration

As the name implies, these statistics can help with verifying the runtime configuration of a GemFire system:

  • The number of peer nodes (i.e. servers or peer accessors) in the system: DistributionStats:nodes. This value should be the same for every node in the system.
  • The number of clients and client connections for each server: CacheServerStats: currentClients, and currentClientConnections
  • The number of data entries:
    • CachePerfStats:entries. Each region has its own CachePerfStats instance per JVM named RegionStats-<region name>, or RegionStats-partition-<region name> for partitioned regions, and its entries statistic is the number of entries for that region in the JVM.
    • DiskRegionStatistics (a per region disk statistic category about the region’s disk use): entriesInVM, and entriesOnlyOnDisk show the number of entries in the JVM (which can also be on disk too), and the number of entries that are only on disk, respectively.
  • Partitioned Region Configuration: One of the main parameters of Partitioned Region (PR) configuration is the primary bucket distribution. To make sure that primary buckets for a PR are evenly distributed, check the PartitionedRegionStats.primaryBucketCount statistic for each partition. This statistic shows the number of primary buckets in a partition.

Resources

The resources that are vital for normal operation and performance are memory, file descriptors (most importantly sockets, then files), CPU, network, and disk (when disk operations, such as overflow and persistence, are involved). The following stats cover all those:

  • Memory: There are several stats categories that show memory usage, for different types and granularity of memory.
    • Heap: VMMemoryUsageStats:vmHeapMemoryStats are all about heap usage, as are the memory stats under VMStats:vmStats: freeMemory, totalMemory, maxMemory.
    • Non-heap memory: VMMemoryUsageStats:vmNonHeapMemoryStats.
    • System-wide memory stats as reported by the OS: The OS statistic category (e.g. LinuxSystemStats on Linux) includes various system level memory statistics, such as freeMemory, which shows the free memory on the host (as opposed to related to the JVM process), physicalMemory (total physical memory on the host), paging related statistics (pagesSwappedIn, pagesSwappedOut, unallocatedSwap).
    • Client and gateway queue sizes: while not actual resources, these queues may be responsible for increased memory usage, so it’s good to keep them in mind when investigating memory issues. The client queue stats are in ClientSubscriptionStats category: eventsQueued, and eventsRemoved. The difference between the two is the current queue size. The gateway queue stats are in GatewayStatistics (GatewaySenderStatistics as of GemFire 7.0) category: eventQueueSize is the size of the queue.
  • File Descriptors: file descriptor related statistics are captured in the category VMStats: fdsOpen and fdLimit show the number of open file descriptors, and the limit on file descriptors for the host, respectively
  • CPU: The CPU usage is captured in OS statistic category, e.g. LinuxSystemStats. The statistic cpuActive shows the percentage of the total available CPU time that has been used in a non-idle state.
  • System load: OS statistic category (e.g. LinuxSystemStats) includes the loadAverage1, loadAverage5, loadAverage15 statistics, which show the average system load for 1, 5, and 15 minutes.
  • Network: OS stats also include network related stats for received (recv) and transmitted traffic (recvBytes, xmitBytes, recvErrors, xmitErrors). Note that some of these statistics may be incorrect in GemFire versions prior to 6.6.2 due to a bug that is fixed in GemFire 6.6.2.
  • Disk: DiskDirStatistics:diskSpace shows the amount of disk space used for GemFire disk storage on a given disk. Above mentioned entriesOnlyOnDisk, and entriesInVM  from DiskRegionStatistics are useful for determining the distribution of data between memory and disk, for regions that use disk overflow/persistence.

Throughput for Different Operations

There are several stat categories that capture the throughput for gemfire operations: CachePerfStats (non-PR, and PR specific), and CacheServerStats, which capture throughput statistics with respect to clients. Note that the PR specific instances of CachePerfStats cover only the specific partitioned regions, while the cachePerfStats instance includes aggregate stats for all non-PR regions.

  • CachePerfStatscategory includes the following stats (all measured in the number of operations per second):
    • gets: the number of successful gets
    • puts: the number of times an entry has been added or replaced as a result of a local operation (put, create, or get which results in a load, netsearch, or netload of a value)
    • updates: the number of updates originating remotely
    • putalls; the number of putAll operations
    • destroys: the number of destroys
    • Function execution: FunctionService
    • Queries: queryExecutions: the number of query executions
    • Transactions: txCommits, txFailures, txRollbacks: the number of successful, failed, and rolled back transactions, respectively
  • CacheServerStatscategory includes the following throughput stats for client operations on the cache server:
    • getRequests, getResponses,
    • getAllRequests, getAllResponses,
    • putRequests, putResponses,
    • putAllReuqests, putAllResponses,
    • queryRequests, queryResponses.
  • Disk operations: If any disk related statistic categories are present in VSD, that means that there is disk activity (some entries are on disk). Presence of disk operations may explain a drop in throughput, as disk use slows things down
    • DiskRegionStatistics (statistics about a region disk use): writes, writeTime, writtenBytes, reads, readTime, readBytes,
    • DiskStoreStatistics are statistics about a specific disk store’s use of disk. In addition to write/read as those in DiskRegionStatistics, this category includes queueSize statistic, which shows the current number of entries in the asynchronous queue waiting to be flushed to disk.

>> To learn more about GemFire, you can read the overview, features, and resources as well as visit the community, read the documentation, or download a trial.

About the Author: Edin Zulich is an Enterprise Architect on the vFabric GemFire team. He has 15 years of experience in software engineering, including 5 years developing high performance data management solutions using GemFire. His special interests are performance analysis and tuning, and large scale systems.

10 thoughts on “Using Visual Statistics Display to Analyze GemFire Runtime Configuration, Resources, and Performance

  1. Pingback: Sizing vFabric GemFire – VMware’s Java-based, Spring-Enabled, In-Memory Data Platform | VMware vFabric Blog - VMware Blogs

  2. Shivprasad Parab

    Edin,
    What tools can be leveraged to get realtime statistics of the GemFire instances from performance and capacity utliization perspective.

    Shivprasad Parab

    Reply
  3. sbobet

    You can definitely see your skills within the article you write.
    The world hopes for even more passionate writers such as you who aren’t afraid to say how
    they believe. Always go after your heart.

    Check out my homepage :: sbobet

    Reply
  4. Sprinkler Tune Up Castle Pines CO

    hey there and thank you for your info – I’ve definitely picked up anything new from right here.
    I did however expertise several technical issues using this web site, since I experienced to reload the website many
    times previous to I could get it to load correctly. I had been wondering if
    your web host is OK? Not that I am complaining, but slow loading instances
    times will very frequently affect your placement in google
    and could damage your quality score if ads and marketing with Adwords.
    Anyway I’m adding this RSS to my email and can look out for much more of your respective exciting content.
    Make sure you update this again soon.

    Look at my blog; Sprinkler Tune Up Castle Pines CO

    Reply
  5. solar panels for sale hamilton ontario

    A client can pick out any random trade name that gets out the virtual benefits of reining air current
    free energy to execute it activenesses. On the monetary value of solar panels other positively torn.
    In fact, you cognize that you can get out of our fuel measures, assist power and biomass.
    depending on cost of solar panels the White House.
    And what about all that cost-efficient and cost effective.

    Reply
  6. best food recipe

    FREE Paleo Recipes: Fruit Cereal wheat and dairy
    free, Trail Mix Recipes, Pumpkin Seed Treat, Chicken and
    Vegetable Soup and Homemade Flax Seed Bread Recipe.
    Paleo diet followers complain about how difficult
    it is to find Paleo diet recipes and the trouble they have planning
    their menu. Without proper fuel, you cannot
    adequately take advantage of the benefits of the Crossfit
    workouts.

    My web-site best food recipe

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>