Database management system

In a deployment that provides high availability, the Information Store database is deployed in an active/passive pattern. In the active/passive pattern, there is a primary database instance and a number of standbys.

Detecting server failure

There are a number of different ways to detect if there has been a database failure.
  • The i2_Component_Availability.log contains messages that report the connection status for the Information Store database. For more information about the messages that are specific to the database, see Database management system messages.
There are also a number of different tools that are specific to your database management system.
  • Db2:
    • Db2 fault monitor on Linux
    • Heartbeat monitoring in clustered environments
    • Monitoring Db2 High Availability Disaster Recovery (HADR) databases
    For more information about detecting failures for Db2, see Detecting an unplanned outage.
  • SQL Server

Manual and automatic failover

When the primary server fails, the system must fail over and use the remaining servers to complete operations.
  • Db2
    • If you are using an automated-cluster controller, failover to a standby instance is automatic. For more information, see .
    • If you are using client-side automatic client rerouting, then you must manually force a standby instance to become the new primary. For more information about initiating a takeover, see Performing an HADR failover operation.
  • SQL Server
    • When SQL Server is configured for high availability in an availability group with three servers, failover is automatic. For more information about failover, see Automatic Failover.

Recovering failed servers

There are a number of reasons why a server might fail. You can use the logs from the database server to diagnose and solve the issue. For example, you might need to restart the server, increase the hardware specification, or replace hardware components that caused the issue.

When the server is back online and functional, you can recover it to become the primary server again. Alternatively, you can recover it to become the standby for the new primary that you failed over to.

This might include recovering a back up of the Information Store database from before the server failed.

Reinstating high availability

  • Db2
    • Some toolkit tasks only work on the original primary database server when you are not using an automated-cluster controller such as TSAMP. To use those toolkit tasks, you must revert to using the original primary database server after a failure or redeploy your system to use the new primary database server.
      • For more information about the process to make the recovered database the primary again, see Reintegrating a database after a takeover operation.
      • To redeploy with the new primary, update the topology.xml on each Liberty to reference the new server in the host-name and port-number and redeploy each Liberty.
  • SQL Server
    • On SQL Server it is not required to return to the previous primary, however you might choose to do so. You can initiate a planned manual failover to return the initial primary server. For more information, see Planned Manual Failover (Without Data Loss).