Hi everyone.
My RHEL 6.1 virtual machines running on ESXi hosts 5.1.0 get frozen when the backup procedure tries to get a quiesced snapshot. I have to reset the virtual machines from VMware console. The tool used to quiesce the virtual machine is SYMCquiesce 1.0.0 by Symantec. The most weird is it does not fail always -- the backup is launched every week... but it fails only every two weeks.
After reading the VMware document http://kb.vmware.com/kb/2038606 and the RedHat document https://access.redhat.com/solutions/484303 and as I was not allowed to upgrade to RHEL 6.4, I decided to downgrade the VMware Tools to ESX 5.0. According to this table, I removed VMware tools 9.0.15 and installed 8.6.11.26309 (build-1310128). But this old version did not solve the problem... the virtual machines get on frozen... every two weeks.
The /etc/vmware-tools/tools.conf contains
[vmbackup]
enableSyncDriver = false
[logging]
log = true
vmtoolsd.level = warning
vmtoolsd.handler = vmx
vmss.level = warning
vmss.handler = vmx
The logging section is included according to this VMware document to get some log messages in the ESXi host.
According to Symantec README file, the script/usr/sbin/pre-freeze-script contains
/opt/SYMCquiesce/bin/freeze
The vmware.log file, located in the ESXi host, contains the following lines when the snapshot is succesfully created:
2016-01-11T23:39:27.849Z| vmx| I120: Vix: [12153447 vigorCommands.c:481]: VigorSnapshotManager_Take: takeOptions=136, clientFlags=0, displayName=NBU_SNAPSHOT backupcentral 1452555565
2016-01-11T23:39:27.879Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT backupcentral 1452555565', deviceState=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=18CA48E0, cbData=19CADF40
2016-01-11T23:39:27.893Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : open successful (65557) size = 188978561024, hd = 0. Type 3
2016-01-11T23:39:27.893Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : closed.
2016-01-11T23:39:27.927Z| vcpu-2| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED
2016-01-11T23:39:29.692Z| vcpu-2| I120: ToolsBackup: changing quiesce state: STARTED -> COMMITTED
2016-01-11T23:39:29.693Z| vcpu-0| I120: Destroying virtual dev for scsi0:0 vscsi=10159
2016-01-11T23:39:29.693Z| vcpu-0| I120: VMMon_VSCSIStopVports: No such target on adapter
2016-01-11T23:39:29.742Z| vcpu-0| I120: SnapshotVMXTakeSnapshotCB: Enter checkpoint status 1, mode 0.
But when it fails, it appears the VMware tools does not respond:
2016-01-18T23:03:40.412Z| vmx| I120: Vix: [12153447 vigorCommands.c:481]: VigorSnapshotManager_Take: takeOptions=136, clientFlags=0, displayName=NBU_SNAPSHOT backupcentral 1453158218
2016-01-18T23:03:40.432Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT backupcentral 1453158218', deviceState=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=18CA48E0, cbData=19B04E90
2016-01-18T23:03:40.519Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : open successful (65557) size = 188978561024, hd = 0. Type 3
2016-01-18T23:03:40.519Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : closed.
2016-01-18T23:03:40.522Z| vcpu-0| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED
2016-01-18T23:03:58.715Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-01-18T23:04:03.814Z| vmx| I120: Tools: Tools heartbeat timeout.
2016-01-18T23:04:13.716Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-01-18T23:04:13.716Z| vmx| I120: GuestRpc: app toolbox's second ping timeout; assuming app is down
2016-01-18T23:04:13.717Z| vmx| I120: ToolsBackup: changing quiesce state: STARTED -> DONE
2016-01-18T23:04:13.717Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: done with snapshot 'NBU_SNAPSHOT backupcentral 1453158218': 0
2016-01-18T23:04:13.717Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).
2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Reinitializing Channel 0(toolbox)
2016-01-18T23:04:13.718Z| vmx| I120: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Channel 0 reinitialized.
2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Channel 0 reinitialized.
Does anyone have any suggestion to solve this problem?
Thanks in advance