What Enterprises Need: Monitoring and Alerting
By Jean Ouellette | May 10, 2013
This is the sixth installment in our weekly technical blog series about Enterprise Content Transformation: What Enterprises Need. The series kicked off with an overview of the reasons your organization needs linear scalability, high availability, easy integration with existing systems, and a highly configurable platform.
Today’s systems are vast and complex, with many moving parts – hardware, software, networking, databases, etc. Each of these components can fail at one point or another, which can cause loss of service, unless a high availability (HA) architecture is in place. Even with HA, a component failure may prevent the system from achieving its required SLA. IT administrators want to be proactive and resolve component failures before they impact users and processes.
The new Adlib Platform provides built-in monitoring and highly configurable alerting to help IT keep the system healthy and available. Each platform component sends a “heartbeat” to the system database at regular intervals, letting the system know that it is alive and well. A monitoring daemon is actively checking whether any component has stopped sending heartbeats. If a component heartbeat is older than expected, the component is marked in Alarm state and an alert is sent via email. Multiple alerts can be sent on an interval basis for the same component; optionally, an email can be sent when the component comes back online.
The subject, content and email addresses of the email alert are configurable through the Adlib Management Console. By default, the email content contains all the information about the particular component that has failed, including component type, component name, computer name, number of heartbeats, and last heartbeat date and time. You can add custom content, such as a link to the Adlib Management Console where the IT administrator can get more information.
The Adlib Management Console provides a dashboard-like graphical overview of the system (right) and indicates which component groups are in Alarm Status. Also, a list view is available to filter and sort based on component type, status, etc. Component Alarms can be acknowledged individually once the IT administrator has investigated the issue and has a plan of action. This allows you to easily distinguish new alarms from older ones.
It is important for IT administrators to be aware of any issues within a system as soon as they arise; this allows you to act on the issue before there is any impact to the end users. Although implementing an HA architecture is an important part of a successful enterprise platform, issues (such as hardware or network failure) can still affect the service, especially if there are multiple failures. The Adlib Platform provides robust monitoring and alerting to help keep your system running at all times and meet users’ satisfaction.
About the Author