Mirage and MongoDB, Part 4: Troubleshooting

Jul 8, 2016
Jason Bassford

Author:

Technical Marketing Manager, End-User-Computing Technical Marketing, VMware

Share This Post On

By Yan Aksenfeld, Member of Technical Staff, VMware

This concludes a 4-part series discussing the addition of MongoDB to Mirage.

Part 1 introduced the benefits of adding MongoDB to Mirage to enhance the performance of Mirage when working with a large number of small files.

Part 2 discussed new components, installations, and upgrades.

Part 3 took a closer look at the underlying technology behind MongoDB and Mirage.

This blog post provides information on troubleshooting issues.

Normal Operation of the Mirage Management Service and MongoDB Database

The MongoDB service is managed automatically by the Mirage Management service and must not be manually touched or manipulated. The status of the Mirage Management service indicates the status of the MongoDB database. If the Mirage Management service is disabled, then the MongoDB database is not active. If the MongoDB database is not active, then Mirage cannot perform any new operations, uploads will not progress, and any other operations in progress will fail because the system cannot read or write small files.

Essential Information for Troubleshooting Issues

There are several things to consider when troubleshooting issues with Mirage and MongoDB.

Note: This post refers to the Mirage Console. However, the Mirage Web Manager can also be used.

  • Always monitor the Management Servers tab of the Mirage Console to verify the MongoDB nodes are online and replicated. If you notice uploads failing to complete, or other operations failing, look at the Mirage Console to make sure all nodes are available. If both primary and secondary MongoDB nodes are displayed as Down, then Mirage will not function.
  • The MongoDB log file is

    C:\Program Files\Wanova\Mirage Management Server\MongoDB\logs\mongod1.log

    When troubleshooting, you can safely disregard any Unauthorized not authorized on admin to execute command { replSetGetStatus: 1.0, forShell: 1.0 } entries you see in the log file. These entries do not indicate a problem with Mirage or MongoDB credentials.

  • The MongoDB configuration file is

    C:\Program Files\Wanova\Mirage Management Server\MongoDB\config\mongod1.cfg

    The most important entry is dbPath, which indicates the location of the MongoDB database file.

    Caution: Do not edit the MongoDB configuration file without contacting VMware Support, because doing so can lead to data loss. For more information, see How to Submit a Support Request.

  • The table MongoServerNodes, in the Mirage SQL database, stores most of the configuration of the MongoDB nodes. The following information is noteworthy:
    Port Number Service Name Function
    27017 VMwareMirageMongoDB A MondoDB database replica node.
    27018 VMwareMirageMongoDBArbiter An Arbiter is running on it.

    Figure 1: MondoDB Ports and Services

    Variable Meaning
    VolumePath The path to the MongoDB database.
    IsActive If the replica node is enabled or disabled.
    MongoReplicaSetStatus The current role of the replica node (1=Primary, 2=Secondary, 7=Arbiter).
    ServerID The ID of the host server.
    FreeAvailableBytes / TotalAvailableBytes The space available to MongoDB.
    VolumeHaveEnoughSpace If the database can continue to expand or if it needs to shut down to prevent failure. This is deprecated in Mirage 5.6 and later.

    Figure 2: MongoDB Configuration Settings

    Caution: Do not change the configuration of the database without contacting VMware Support, because doing so can lead to data loss. For more information, see How to Submit a Support Request.

Troubleshooting Common Issues

There are several common issues you might encounter when using Mirage with MongoDB. All of these will require investigation. Look at status messages in the Mirage Console and entries in the MongoDB log file.

Note: In the following, only certain MongoDB nodes might be experiencing a problem. Those that are will be displayed as Down in the Mirage Console. To enable a disabled MongoDB node after correcting the underlying problem, open the Mirage Console’s Management Servers window, select the node that is Down, and Click Start to re-enable the node.

MongoDB Nodes Are Down Because Required Ports Are Closed

MongoDB uses port 27017 for MongoDB service communication on the Mirage Management servers. The Arbiter uses port 27018 for its communication. It is important to keep both ports open bidirectionally. If there is a communication problem with any of the MongoDB nodes, the affected nodes will be displayed as Down in the Mirage Console, and the MongoDB log will have multiple entries indicating communication failure over specific ports.


Corrective action: Verify the affected Mirage Management servers can communicate on ports 27017 and 27018.


MongoDB Database Is Corrupted

If the MongoDB database is corrupted, you will see log entries corresponding to when the system tries to work with corrupted records. These entries will be similar to: WiredTiger checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic. Corruption is usually caused by either instability or high latency involving the storage array where the database is located.


Corrective action: Recreate the MongoDB database on a stable local drive. To recreate the MongoDB database, either delete all of the files in the target database folder, or delete the entire target database folder and then create another folder with the same name.

If a node has been marked as Down because of database corruption, recreating its database and setting the node to Up will result in healthy data being replicated to it if there are replica nodes with which it can resync. This will prevent any data loss.

Caution: Recreating the MongoDB database can cause loss of data, including CVDs, base layers, and app layers, if there are no replica nodes with a healthy database from which data can resync. Restoring CVDs will not work because small files they depend on will no longer exist. A large number of automatic integrity reports will run on all CVDs and the missing files will be re-uploaded when the respective endpoints connect, but restores will not work for some time until the MongoDB database is repopulated. Even more importantly, app layers and base layers lose all their small files (if they were captured in Mirage 5.4 and later). If small files in these layers do not get uploaded from CVDs, the assignment of these layers will not work and they might need to be recaptured.

Note: If in doubt about data loss, contact before recreating the database. For more information, see How to Submit a Support Request.


MongoDB Database Is Too Fragmented

This issue occurs when the MongoDB database is placed with other files on the same disk, especially if placed on Mirage volumes with CVDs. MongoDB is sensitive to fragmentation, and a state might be reached where NTFS cannot write to the database file. You will see the entry The requested operation could not be completed due to a file system limitation in the MongoDB log file.


Corrective action: If you see this log entry, run the Windows Internals tool contig from a command prompt: contig –a <MAIN_COLLECTION>, where <MAIN_CONNECTION> is replaced by the path to the MongoDB main collection. The MongoDB main collection is the largest file in the folder where the database is stored. If the result is in the thousands, then the MongoDB main collection is fragmented and it is important to move it to a dedicated disk without any other files or CVDs. For more information, see the VMware knowledge base article Changing the location of MongoDB when in use by Mirage (2131044).

Note: Manually defragmenting the MongoDB main collection is ineffective.


Replication Does Not Work Because a Replica Is Too Stale

When a MongoDB node has been offline for a long time, the data can become too stale to replicate. You will see one of the replica nodes displayed as Down in the Mirage Console even though the service is running. You will also see MongoDB log entries similar to replSet error RS102 too stale to catch up.


Corrective action: Recreate the MongoDB database on the replica node. Make sure the primary node is displayed as Up in the Mirage Console. Either back up the MongoDB database folder on the affected secondary server, or rename it. Recreate the folder for the database, and the Mirage Management service will recreate the node. Replication will start automatically, you will see the file size growing, and the node will now be displayed as Up in the Mirage Console.

Note: MongoDB will not automatically recreate the MongoDB database folder if it has been deleted or renamed. You must recreate it yourself.


MongoDB Service Is Stopped (in Mirage 5.6 and Later)

Mirage 5.6 introduced a new feature that protects MongoDB from data loss. Mirage will stop the MongoDB service—which disables the MongoDB database—on affected nodes in certain scenarios. Entries in the Mirage event log will indicate if the MongoDB database is disabled. For example, you might see the log entry MongoDB crashes reached critical threshold, shutting down node for maintenance. The Mirage Management service will make sure the MongoDB service remains stopped, and you will not be able to change this manually because the service will be stopped again as soon as you start it. A MongoDB service will remain stopped until the problem is corrected and the affected node is re-enabled from the Mirage Console.

Note: In Mirage 5.8, additional status messages were added to the Mirage Web Manager which help identify the underlying cause of the MongoDB service being stopped. These new status messages are not displayed in the Mirage Console.

There are three scenarios that result in Mirage stopping the MongoDB database.

  • 10 crashes of the MongoDB service in a period of 3 days because of database corruption or storage instability.
  • Latency to the MongoDB database disk is 1,000 ms or more, on average, over 20 minutes.
  • Less than 5 percent free space is available on the MongoDB database disk.

Corrective action: Identify the scenario that caused the MongoDB database to be stopped and address it appropriately.

If the problem persists after you have taken corrective action, the MongoDB service will be stopped again. For more information, see the VMware knowledge base article MongoDB service is disabled in VMware Mirage 5.6 (2141995).


Unable to Move the MongoDB Database from the Mirage Console (in Mirage 5.6 and Later)

In Mirage 5.6, VMware introduced the option to move the MongoDB database from the Mirage Console. If you click Configure and enter a new path but get an error indicating the move operation has failed, ensure that the Mirage Console displays two or more MongoDB nodes as Up.


Corrective action:  You cannot move the MongoDB database without at least two nodes displayed as Up. Either correct the situation that has caused a node to be Down, or add another Mirage Management server to your environment.


Only One MongoDB Node Is Up, Even Though Two Mirage Management Servers Are Installed (in Mirage 5.4),

In Mirage 5.4, each MongoDB database is stored on a specific Mirage storage volume. This means that to have two Mirage Management servers you need to have at least two volumes. If you install a second Mirage Management server but only one storage volume is available, the node will be displayed as Down in the Mirage Console


Corrective action:  In Mirage 5.4, add another Mirage storage volume. However, I would like to once again take this opportunity to mention that VMware recommends upgrading Mirage to the latest version (or at least version 5.5) and stop storing the MongoDB database on a volume with other CVDs. After the upgrade, move the MongoDB database to local and dedicated storage.


You Cannot Disable a Volume (in Mirage 5.4)

In Mirage 5.4, if the MongoDB database is stored on a storage volume, the volume cannot be disabled. (You would disable a volume if, for example, you need to perform maintenance on it.) Attempting to disable the volume will generate the log entry Failed to disable volume used by SIS2.


Corrective action: Upgrade to Mirage 5.5 and later, where the database can be moved to a location that is not a volume. After moving the database, the volume can be disabled. For more information on moving the database, see VMware knowledge base article Changing the location of MongoDB when in use by Mirage (2131044).


Log Entries That Can Be Safely Ignored

During normal operation, there are a couple of log entries that need not concern you.

  • When a small file is not located in MongoDB, an entry will be added to either the Mirage server or Mirage Management server log file, depending on the type of operation:

    System.IO.FileNotFoundException: File was not found in SIS2.

    This is normal behavior and will happen often when the MondoDB database is populated after a fresh Mirage installation or an upgrade from a version earlier than 5.4. The file will be added to MongoDB.

  • When a small file is not located in MongoDB, a fallback to Mirage storage will occur and an entry will be added to the Mirage server log file:

    Wanova.Server.Common.FileSystem.VaneFileSystem File: [Path: ****PATH****, Signature: 65EA0AB36C76218B4F7FE81D84420906, Size: 1297, Type: PlainData] was not found in SIS2, fallback to disk.

    This is normal behavior. The file will be added to MongoDB. The file will also be read from Mirage storage.

    Note: If the file is also not found on a Mirage volume, then the operation might not complete successfully.

Summary of Part 4

The following topics were discussed in Part 4 of the blog post series on Mirage and MongoDB:

  • System monitoring
  • Locations of the MongoDB log and configuration files
  • Troubleshooting common issues
  • Normal log file entries

Part 1 introduced the benefits of adding MondoDB to Mirage.
Part 2 discussed new components, installations, and upgrades.
Part 3 took a closer look at the underlying technology behind MongoDB and Mirage.

468 ad