As a VMware Technical Account Manager, I’m an extension of my customers’ team. With my years of experience and deep VMware technical expertise, I’m focused on helping them achieve their objectives with VMware technology.
On a recent Saturday evening, my customer contacted me because one of their VMware vCenter Server® UI was inaccessible. Unsurprisingly, I got a generic error message “HTTP Status 500 – Internal Server Error”. Okay, no big deal! I figured it would be an easy fix.
Checking the VPXD Service Status
The first thing I did was check the status of the VPXD service. Surprisingly, it wasn’t down. I admit that I overlooked the status of the rest of the services, which would have reduced the troubleshooting efforts had I checked.
However, checking the vpxd logs uncovered a lead for the investigation. I noticed this error message:
error vpxd[43060] [Originator@6876 sub=Main opID=CheckCertificateExpiry-77188b05] Unable to get certificates from the store APPLMGMT_PASSWORD
Upon checking the certificate status in the UI, I noticed that the VMCA issued certificate of the vCenter Server had expired. Easy fix! But, what seemed to be a straightforward task, turned out to be a challenging one.
Regenerating a New VCMA Root Certificate
I asked the customer to take a snapshot of the vCenter VM. Then I referred them to follow our how-to guide on regenerating vSphere using self-signed VCMA and asked them to choose the option ”Regenerate a new VMCA Root Certificate and replace all certificates”. However, regenerating a new VMCA root certificate failed. The customer tried the option “Reset all certificates“ and that failed too, with this message:
This forced a status check of the vCenter Server services before proceeding with any other troubleshooting steps which revealed the Secure Token Service(vmware-stsd) was in a stopped state. Checking the sts log(/var/log/vmware/sso/vmware-identity-sts.log) gave a hint that the STS certificate was also expired:
INFO sts[78:tomcat-http–40] [CorId=d00e7424-e0a1-4a30-b19f-042bafd63c79] [com.vmware.identity.sts.InvalidCredentialsException] Censored exception com.vmware.identity.sts.InvalidCredentialsException: Solution user cert is not valid.
Regenerating the STS Certificate
Next, the customer needed to regenerate/renew the STS certificate. They downloaded our script to regenerate the STS certificate and after running the fixsts.sh, they were able to recover the STS service. After this, they attempted to renew the vCenter certificates using the option “Regenerate a new VMCA Root Certificate and replace all certificates” and to our surprise, this failed. I recommended they reset all the certificates by choosing the option “Reset all Certificates” and this started to fail as well.
The situation started to heat up. Backups of the VMs started to fail due to the inaccessibility of vCenter Server.
Finding the SSL Trust Mismatch
I paused for a few minutes and went back to review the chronology of events. What I realized is that during the troubleshooting process, an important step was missed and that broke the trust relationship between the solution users.
ERROR generateReport: <<censored>> (VC 7.0 or CGW) found SSL Trust Mismatch: Please run python ls_doctor.py –trustfix option on this node.
We used the Lookup Service Doctor (lsdoctor) tool to check if there were any trust mismatches or errors and the output suggested that we had to fix the trust mismatch. Once the customer was able to fix the trust mismatch, regenerating the certificates went through without any issues.
Achieve Faster Resolution with Step-by-Step Guidance
In the end, the customer’s issue was resolved quickly, and their services were recovered. In the event you should experience something similar, below is the chronology of the steps that should have been followed to resolve the issue even more quickly.
- Take a Snapshot of the vCenter Server VM (It should be an offline snapshot if the vCenter Server VMs are in ELM)
- Check if the STS certificate is valid before regenerating the certificate using Certificate Manager (Do not skip this step).
- Download the script titled “checksts.py“
- Copy the downloaded “checksts.py” script to /tmp of VCSA VM
- Execute the command “python checksts.py”
- If you notice that the certificate is expired, continue with the rest of the steps. Otherwise, move to Step 3.
- Download the script titled “fixsts.sh“
- Copy the downloaded “fixsts.sh” script to /tmp of VCSA VM
- Run “chmod +x fixsts.sh” to make the script executable
- Run “./fixsts.sh”
- Once the above task is complete, restart all the services on the vCenter Server by executing the following command: service-control –stop –all && service-control –start –all
- Ensure that the required vCenter Server services are started, which should look similar to this:
- Optional (but always good to run): Use Lookup Service Doctor to correct SSL trust mismatch
- Download the Lookup Service Doctor (lsdoctor) tool
- Copy the downloaded file to the /tmp folder of VCSA VM
- Unzip the folder by executing the command “unzip lsdoctor.zip”
- Navigate to the master folder of the tool “cd lsdoctor-master”
- Execute the command “python lsdoctor.py -l”. This option will check for common issues in the Lookup Service and WILL NOT make any changes to the environment. If you notice any error from this output, you will have to fix that first. Proceed to next step if you notice any errors, otherwise, move on to step 4.
- Execute the command “python lsdoctor.py -t”. This option was chosen because the trust relationship was broken in our setup. This option corrects SSL trust mismatch issues in the lookup service. The lookup service registrations may have an SSL trust value that doesn’t match the MACHINE_SSL_CERT on port 443 of the node. This can be caused by a failure during certificate replacement, among other failures.
- Once the above task is complete, restart all the services on the vCenter Server by executing the following command: service-control –stop –all && service-control –start –all
- Ensure that the required vCenter Server services are started. Should like something as below:
- Re-register any external solutions that were previously pointed to the affected node(SRM, vSphere Replication, NSX-V, etc.)
- Regenerate the VMCA Root certificate
- Launch the Certificate Manager utility “/usr/lib/vmware-vmca/bin/certificate-manager”
- Choose option #4 “Regenerate a new VMCA Root Certificate and replace all certificates”. There will be a few parameters that need to be keyed in. We can leave most of them to their defaults unless you want to change them. Below are the options we chose. The blank values for the parameters are the defaults.
- Do you wish to generate all certificates using configuration file: Option [Y/N] ? : Y
- Please provide valid SSO and VC priviledged user credential to perform certification operations.
- certool.cfg file exists, Do you wish to reconfigure: Option [Y/N] ? : Y
- Press Enter key to skip optional parameters or use Previous value. Key in the required values.
- The certificate renewal should now be successful.
- Check the vCenter Server services status and attempt to login to the vCenter Server.
Would your team benefit from a VMware technical expert?
Besides helping customers when they are in triage mode, Technical Account Managers collaborate with customers on all kinds of activities such as:
- Conducting environmental assessments
- Sharing best practices
- Identifying technical gaps that may be impeding progress
- Providing solution guidance for advancing initiatives
- Making recommendations for optimizing day-2 operations
- Helping plan for and prepare for product upgrades
- Sharing peer and industry insights
If you don’t have a Technical Account Manage (tip: you really should), reach out to your VMware Account Manager to learn how Technical Account Management Services could help your organization.