Mostrando entradas con la etiqueta split-brain. Mostrar todas las entradas
Mostrando entradas con la etiqueta split-brain. Mostrar todas las entradas

15 noviembre 2011

Resolve Heartbeat Split-brain syndrome

La mayoría de los casos de Split-Brain ser producen por las siguientes causas:

-perdida de conectividad entre los site por enlace WAN
-el servidor activo esta demasiado ocupado para responder las peticiones de heartbeat
-mala configuracion de heartbeat en alguno o ambos de los extremos.

Os dejo una guía para solucionarlo en caso de que se produzca:

Primero identificaremos cual es el vCenter con mas datos actualizados:

1 Check the date and time of files on both servers. Make the most up-to-date server the active server.
2 From a client PC on a LAN, run nbtstat -A 192.168.1.1 where the IP address is the Principal (Public) IP address of the server. This can help identify the MAC address of the server currently visible to client machines.

Después, seguiremos estos pasos para restablecer el activo-pasivo:

1 Identify the server with the most up-to-date data or the server to make active.
2 Shut down vCenter Server Heartbeat on both servers (if running).
3 On the server to make passive, right-click the Task bar icon, and select the Server Configuration wizard.
4 Click the Machine tab and set the server role to passive. Do not change the identity of the server (Primary
or Secondary).
5 Click Finish.
6 Restart this server.
7 Start vCenter Server Heartbeat, if required, and check that the Task bar icon now reflects the changes by
showing P / - (Primary and Passive) or S / - (Secondary and Passive).
8 On the active server, right-click the Task bar icon and select the Server Configuration wizard.
9 Click the Machine tab and verify that the server role is set to active. Do not change the identity of the
server (Primary or Secondary).
10 Click Finish.
11 Restart this server. As the server restarts, it connects to the passive server and starts replication. The active server overwrites data on the passive server.
12 Start vCenter Server Heartbeat, if required, and check that the Task bar icon now reflects the changes by showing P / A (Primary and active) or S / A (Secondary and active).
13 Start vCenter Server Heartbeat Console.
14 Check that the servers have connected and replication has started.

14 noviembre 2011

vCenter Server Heartbeat 6.4 Split-Brain Avoidance

   El problema que puede ocurrir al tener dos sites con vCenter protegido con Heartbeat, es que se llegue a perder la conectividad entre ambos y se produzca una bicefalia o split-brain, en la que ambos vCenter se creen que tienen el control de la infraestructura. 



Para evitarlo, VMware nos deja una serie de pasos en la configuración respecto a la prevencion del failover:

 "To enable Split-brain Avoidance, open the Server: Monitoring page in the vCenter Server Heartbeat Console, click Configure Failover, and select Prevent failover if channel heartbeat is lost but Active server is still visible to other servers (recommended)."

Tambien debemos configurar una serie de ip´s de gestion para que entre ambos sites se hagan ping las ethernet para comprobar el estado entre el site pasivo y el activo:


1 Open the network properties for the Principal (Public) network connection.


2 Double-click TCP/IP to display the properties.


3 Click Advanced.


4 Enter an additional (currently unused) IP address in the table.


5 Reposition the IP addresses in the list so that the additional (Management) IP address appears first, and the Principal (Public) network address (by which clients connect to the server) appears second.



6 Click OK on all three dialogs to accept the configuration changes to the network connection.


7 After completing all of the steps click Next or Finish.


El servidor activo debera responder antes del valor de tiempo fijado en el "Failover timeout" para prevenir que el failover ocurra, que esta fijado en 60 segundos por defecto.


1 Click Configure Failover to open the Server Monitoring: Failover Configuration dialog.


2 Type a new numeric value (seconds) in the Failover timeout text box or use the arrow buttons to set a new value.


3 Mark or clear the check boxes to select the actions to take if the specified Failover timeout is exceeded. Click OK to finish.