We believe that proper management of network resources is not just a matter of quick resolution of various urgent problems. A responsible system administrator should apply proactive practices to predict any faults and avoid them.

Monitor Statistics ReportIn many cases problems have roots that can be seen long before the actual fault. For example, if you experience a constant growth of number of customers visiting your web site, one day you will probably notice 100% CPU usage on your web site or data base system and will need to do an urgent hardware upgrade. If you could take a look at the CPU load parameter from time to time and noticed that it was growing constantly, you would probably prepare for that upgrade in advance.

So, in general the problem is that when you design your monitoring system by specifying various monitors, warning and error levels, notifications, etc., you can take in account only some simple fault situations. There are many situations and tendencies that can not be treated as faults and that are hard to specify in terms of warning levels. Yet, the proper examining of them can allow you to predict the tendencies of the overall system development and possible problems.

The users of Server Supervisor can perform such periodic examination using reports. The product provides the following types of reports:

  • Simple monitor statistics. This report shows the values of a selected monitor during any specified period of time. You can also choose the scale to get more or less details.
  • Resource uptime percentage. For a specified time period it shows the percentage of time the selected monitor was in each state.
  • Summary by week days and hours. With this type of report you can get average statistics for particular days of week or day hours. For example, you know that your web system experiences highest load on weekdays around 4:00 PM. You can create report for last 5 days for the period between 3:00 PM and 5:00 PM to see the average values of CPU load monitor.
  • Group reports. Resource uptime and summary reports can be created for groups of monitors.

We thought about many other types of reports that could be useful in various specific conditions. Finally we decided to start with a few types mentioned above and wait for a feedback from our users. Of course, we tried to design the whole system in a way that would allow us to add more types of reports easily. So, your suggestions are very welcome.

You can specify parameters of reports that you want to receive periodically and Server Supervisor will send these reports to you by email at the specified date and time. For example, you can configure it to send daily reports for a group of monitors for your web site every day at 9:00 AM. This way you will receive the information about last 24 hours every morning.

When you create a new product, one of the most important problems is how to design a good user interface. In many cases the solution of this problem becomes a key factor in the overall product success or failure.

The experience shows that it is hardly possible to design an “ideal” user interface that would be most convenient for all possible tasks the users will want to perform with help of the product. Therefore you should probably concentrate on the most common use cases. If we imagine an average user and try to measure all the time he or she uses a product and then try to classify that time by different use cases, we will find out that 90% of that time is covered by several major ones. So, the goal is to create such interface that would be optimal for these major use cases. That is what we tried to do.

We think that in our case these use cases are:

  • Quickly get an overview of all the resources to identify possible problems.
  • Analyze various parameters of the monitored resources over any selected time period in order to investigate the source of the problem.
  • Check that everything is working fine once the problem is resolved.

We also had some general requirements for the user interface. It should have been designed to be accessible remotely via web to let users check and configure anything from any location. We also wanted it to be fast, easy-to-use, and good-looking (after all, we were going to sell the product, so we wanted to provide really good impression on people). That is why we chose Flash technology.

The screen shot below shows how it is organized in general.

click to view full-size image

You can switch between several tabs using the buttons on the left of the window. The most important tab is where you can see all the monitors. They are gathered to groups, so that it would be more convenient to manage them. For example, you can group monitors related to the same server. Visually each group can be expanded or collapsed.

Each entry in the list of monitors shows basic parameters of that monitor and the statistics for the last 24 hours in form of a bar painted in green, yellow and red (like a small colored graph).

You can select a monitor in the list and see more detailed statistics for it in the right view. This will let you perform deeper analysis. You can select different time periods and get any information that is required to analyze the work of that monitor.

At the top of the window there is a consolidated view that shows very basic parameters of the whole monitoring system, including the current number of monitors in each of three states and 24 hours history for all monitors.

We believe that such approach provides a view of all the monitored resources in a comprehensive form and at the same time lets users quickly get to any level of details.

Our server monitoring solution can be installed on any computer that has network connection to all resources that should be monitored. The product includes data base that is used to store all statistical data. A convenient Flash-based web interface is used to configure the monitoring system and work with the data collected during the monitoring process. This means that the system can be accessed remotely from any location with help of a web browser. To provide proper security we use SSL-protected connection and password protected user logins.

Now let’s see how to configure our system. To monitor the availability and correct work of each network resource you should create a monitor object responsible for that resource. The following types of monitors will be available:

  • Network (Ping, TCP, Network bandwidth);
  • DNS;
  • HTTP, HTTP Content, HTTP Transaction;
  • Specific monitors for Apache and IIS web servers;
  • Process execution monitor;
  • Data base monitors for MS SQL and MySQL;
  • Mail servers (SMTP, POP3, IMAP4);
  • FTP server;
  • OS resources (CPU, Memory, Disk space, file).

We selected these types of monitors in order to cover all areas essential for web sites and web applications. At the same time, many of these monitors can be used for general monitoring purposes. For example, you can ping any computer inside LAN to confirm its availability or check that some process is permanently active on a system with a process execution monitor.

Q: Would you like to add another monitor type?

In general you can create several monitors for single resource. This is useful to check it for different types of faults. For example, you can configure ping monitor along with HTTP and HTTP content monitors for your web site. Ping will simply check the availability of the computer on the network. HTTP monitor will check the ability to connect to HTTP port of your web site, whereas HTTP content monitor will check that the response of the web site it correct. The latter would mean that not only the web site engine is working, but that your web application and data base that it uses are also up and running.

Some monitors measure certain parameters rather than check something for presence or correct work. For example, ping monitor measures ping time, CPU usage monitor measures CPU load on a system. It is up to you to decide what values of the corresponding parameters are acceptable and what values should be treated as faults. In our system you can specify warning level and error level for these values. So, depending on the value produced as a result of the latest execution of a monitor, that monitor can be in one of tree states:

  • Ok;
  • Warning;
  • Error.

Q: What do you think about the warning levels?

The state can be changed when the monitor is executed next time. Of course, you can specify the time between executions for each monitor. For example, some resources can be checked once a minute, some every 10 minutes, etc.

When the state of a monitor is changed a specified action can be performed. Moreover, you can specify a list of actions for each type of change for each monitor. For example, you can configure the system so that it would send an email message to your network administrator every time when the state of a monitor that pings your web site changes from “Ok” to “Warning”. For now the following notification methods are available:

  • E-mail messages;
  • Instant messengers (ICQ, Yahoo, MSN);
  • SMS.

Different people can be notified in case of different problems using different methods. For example, most urgent notifications can be sent as SMS messages to the system administrator and duplicated by email to other people who manage corresponding server resource.

Q: Are you Ok with the proposed notification methods?

Next Page →