More Console tools

You are probably quite familiar with top and that it is a useful command line tool.  Well we are going to look at a couple of top-like tools that provide  a diffent view of what is running on your Server.

htop and atop – taking top to the top

By default neither of these will be installed on your Linux server, so you will have to install them before using.  I will leave that as an exercise for you to do.

htop – is very similar to top but it creates ascii graphs for processor, memory and swap usage, then puts a summary of tasks, load and uptime in the header area as well.  For those that like to see a chart, rather than numbers it is a bit easier to read.  On first glance, it does not appear to be much different than top, but it is.

One of the big differences is that you can scroll down the process list and actually see every single running process on the server.  Yes, the display still updates frequently as well.

If you look at the bottom row, there is a series of Menus you can access with the “F” keys (F1 thru F10).  You can search, change the sort order and renice a selected process all at the click of a single button.  You can navigate around the screen with the Arrow keys.

If I were to give a quick summary comparing htop with top, is that it is a user-friendly top replacement.  Check out the sceeenshot below.

atop is very different from the standard top in a very different way. when using atop, the header information is no longer 4 simple lines,instead it could be as few as 10 or many more!

The first difference you will notice with atop is that the header has a lot more information – It has a line showing process information, A line for each CPU and a CPU Avg line A line with Memory info, a line for Swap info, a line for each Disk and some Network lines – transport, network and Devices.  The number of header lines will change with system events and will change colour as well (eg disk line changes red if there is heavy disk activity)

After you get past the header, the bottom section shows the process list in much the same way as top (and atop), but by default, it only shows processes that are being used at the time, all idle processes are hidden.  To change the display, just like top, you can hit the “h” key for Help on the hotkeys and what you can change.   Unlike top, which only has a handful of parameters,  atop has a lot more options. See an example screenshot below.

The quick summary for atop compared to top is that it has a lot of additional information all available from the one place, it is Advanced, and much more than top.

2 new tools to explore and help you manage your system, one that is simpler than the standard top and one that is much more advanced.  My personal preference is that I use the standard top and also install atop.

These 2 examples I have given are not the only “top” or top-like cloned applications for the System Admin – 2 others I will mention in passing are ntop – a Network Top clone and mytop – a MySQL top clone.

Other Sysadmin Tools

Every Sysadmin has their set of favourite tools to monitor and manage their Servers, so today, I thought I would share another couple of mine.

Years ago, when Running a VPS came across vpsinfo – a nice web interface to your Linux VPS – and as it happens, any Linux Server.

I installed it and found it to be a convenient tool that gave me access via a web browser to my Server.  I now install it on all my  Servers.  There is nothing really “special” about the info you get, it just puts together quite a lot and makes it available in a web browser.

You get all in one place a lot of info on your server – just take a look at their website to see some of what is on offer.  It is simple to install and configure, just a few variables.

Typically, I tend to use this to get a quick glance at how a server is running, without logging via SSH and doing it all manually.

I have found that it integrates into a CPanel server quite well and provides more detailed info than you get from logging into WHM.

The requirements are pretty simple – a Linux Server (or vps) with a web server and PHP.

There are a few extra’s that vpsinfo can use and it is well worth taking a few minutes to install them.  vnstat is used to report on network usage.  You can choose  from mytop or mysqlreport to get a decent view on mysql performance.  Personally, I prefer to install and use mysqlreport.

Now, when it comes to Mysql, there are a couple of other really useful scripts that I use – and

The tuning primer script can be easily found – just do a google search for it.  If you have a Cpanel server then you already have mysqltuner -

root@server [~]# /usr/local/cpanel/3rdparty/mysqltuner/

If you don’t have Cpanel, then again, a quick google search will locate it for you.

Hopefully, you can add some or all of these tools to your toolkit to make your life as a Sysadmin easier.

Help! my server is Slow

Wow, what a catch-cry!

If you hang out on any Webmaster forum it is one that is heard nearly every day.  It is always someone who has little or no experience as a Systems Administrator and little to no clue as to how to go about finding and resolving the problems.

For the most part, it is usually a pretty simple process to determine the cause and to take the appropriate action.  The appropriate action is usually a case of reducing resource usage by tuning various configuration files, replacing Applications with more resource-friendly one’s or upgrading hardware.

What to do?

Well, for the smart ones, they would go to their monitoring, check the system against their known baselines, determine exactly which resources are being consumed and then address the issue.

A quick and dirty procedure that I follow.

  1. Check for any Alarms and action as required.
  2. Check my Monitoring for the historical view
  3. Dive into the Server and use real-time tools to see for myself what is going on.
    1. top – to get a quick overall snapshot
    2. sar – to get some further detailed info
    3. ps to check out running processes.
  4. Determine how to address the specific problem/s.

In my experience, in a typical web hosting environment there are really only a handful of configs that are the cause of the problem.  It usually comes down to a change in the popularity of  a website on the server and adjusting some of the config accordingly.

So many times, people who don’t have sufficient knowledge and experience are “running” a server and have never tweaked the config files.  The most common config files that need attention are the Mysql (/etc/my.cnf) and the Apache (httpd.conf) files.

In our personal experience, we have found that the biggest problem is that MySql is using an almost default/generic config.  Additionally, we have found that our MySql configs are almost always different on every single server – we tune the config to minimise resource usage and maximise server efficiency.  This is never a set and forget task!

For this little task, we use a few different tools to help us.

  1. mysqlreport
  4. phpmyadmin

All of these tools can be used to show us specific info regarding how Mysql is configured and performing.  It typically takes about 2-3 weeks to get a decent config, and then a few minutes each month to verify all is working as it should.  Server loads change and when required, we tweak the configs.

The next biggest resource hog that we can easily do something about is Apache.  Tuning Apache is a massive subject in itself, but there are several simple steps that can be done.  When resources are at a premium, replacing Apache with something smaller and faster is often worthwhile.  There are quite a few choices out there.  While learning something new can be quite challenging, and a big change in how it is configured.

What we prefer to do is to keep Apache, but just use it in a different way.  What we do is to install Nginx as a reverse proxy in front of Apache.  What this does for us is to maintain the known configurations whilst reducing resource usage (by orders of magnitude).

Apache can consume 1-200Meg of RAM per thread that is running, and on a busy server, this quickly eats up all available RAM.  Nginx only uses about 30K of ram per worker process, which is much less than Apache.  By installing Nginx as a reverse proxy, it serves up all the static content such as images, css and html files and the dynamic requests are passed back to Apache to process.  In the real world, we find that rather than Apache having 80-100 threads running, in this sort of configuration it is usually around 8-10, which frees up a lot of resources.

We actually ran our servers for several weeks and collected baseline data before installing Nginx as we have described.  We saw the effect almost immediately in our monitoring.  A very nice drop in RAM and Processor utilisation that was well worth the effort involved in making the changes.

If you don’t monitor and don’t have a baseline, then it is really hard to see the overall effect of any changes you make to a System.

Cpanel Backup Strategy

Backups are one aspect of a server that are often forgotten about to a degree, that is of curse until they are needed!

With a CPanel Server, there is a backup utility built right in.  Some would argue that it is not the best, and they might be right, but for user accounts and the basic system config, it works just fine.

Setting up the backups is quite trivial – just select the schedule and “forget about it” for the most part.  As long as you have the space, then you are fine.

What we do for our hosting is to build on this basic backup regime and actually make sure we have the data not only backed up, but available.

First thing we do at the basic level is to do all backups to a different physical disk in the server than where the data lives.  This protects us against physical disk failure.  Well, it doesn’t protect us from a hardware failure, it just means we have the data on a drive in the Server to easily restore from.

The next thing we do is to get down and dirty and make use of a part of the Cpanel backup that is outside of the GUI environment.  We have a nice little “postcpbackup” script to do some extra work for us.  Before the Normal CPanel backup script finishes it looks in the cpanel scripts directory (previously it was /scripts/ but now it is /usr/local/cpanel/scripts/ ) for a postcpbackup script.  If it exists, then it runs it.  Please take a look at the CPanel Documentation for more details on this behavior.

In our hosting environment we make use of this script to make extra copies of our backups.  As we have multiple Servers, what we do is to make a copy of the backups in our /backup directory on another physical server.

Our chosen method is to make use of rsync to do the grunt work, and we do this over ssh.  for this to work, you need to have a user configured on your server that has certificate-based ssh access to the target server and also has sufficient permissions to be able to read the /backup directory.

This is step 2 or our backup strategy and at this point, we now have 3 copies of our data across 3 physical disks and 2 physical Servers.  Yes, servers can and do fail, so we like to keep multiple copies of our valuable data!

We do not stop there!  Our Servers are in the same data center, so what happens should the unthinkable happen and the Data Center is destroyed? As we are the eternal pessimists and think that we could even lose a whole server and access to our backup server at the same time (touch wood, has not happened yet), we also grab a copy of the weekly backups and put them on a 3rd server at yet another physical location.  We once again use rsync to do the grunt work of copying the files.

To us, backup is not just about being able to restore a Client’s website should they accidentally delete it or it is hacked (it does happen), but it allows us to be able to re-build our business in case of multiple failures or disaster.

It might not be obvious as to how we monitor our backups, but we do!  Of course, we have the CPanel backup report that gets sent via email.  We additionally write log files from our postcpbackup script as well.  Our offsite weekly backup script also sends an e-mail telling us what it has done as well.

while it might not be a sexy graph we can look at, we know what our systems are doing, and know where to find the data to put them back together when we need to.

What is your backup strategy?

Could your business survive if your hard drive or even a complete server failed?

Capacity Management

You often hear all sorts of questions regarding web hosting, but one that crops up a lot is

“How many customers can I put on a server with x specs”

The most common answer to this question is “it depends”

The biggest benefit to our Hosting business after we implemented our Monitoring solution was that rather than guessing, we could very easily manage the Capacity on each of our Servers.

Lets face it, In a competitive industry like Hosting, you want to get a good return on your Investments in Server hardware and still provide a good service to your clients.  It is because we monitor our Servers we know when they are getting close to capacity and start planning on adding more Servers or resources as required.

There is always a weakest link in every Server and by monitoring, we know when  we are getting close to hitting any of those limits and plan around it.

In our typical Web hosting environment we need to monitor and plan for quite a few things.  First off, over time, our clients who grow their websites will use more and more Server resources.  The Most typical things we will use for our capacity management are:

  • Load Averages
  • CPU usage
  • Disk (space) Usage
  • Disk Performance (IOPS)
  • Memory Usage
  • Network Traffic

No longer is the RAM, Hard disk Capacity, CPU or Network Speed the problem in most cases.  We have seen faster and cheaper hardware year in, year out for a long time and it is pretty hard to hit most of these limits most of the time.

What we have found to be one of the biggest limiting factors is the Disk Performance.  With a physical disk drive, there has been no real improvement in IO speed for a long time.  The only way to improve the performance is to increase the number of disks, or move to SSD Storage.

When it comes down to it, we most often make upgrade decisions based on IO performance alone.  It is always the case of making a business decision to either spend (considerably more) on SSD’s or add more Servers with faster or more disks.

As we have discovered that Disk IO is the biggest limiting factor in server Performance, it is an area that we really concentrate our monitoring efforts on.

Often, I read on Forums questions like

“My site is currently on x server and it is slow, what server do I need to put it on to make it fast again”

You see all sorts of responses, but my first question is always “What is the bottleneck you have identified from your monitoring?”

It is surprising to me how many people have no Idea as to how their server is running and where they need to increase resources to make an improvement.

Alarming And Alerting

Alarming and Alerting is all about Automating the System Administration tasks.

We have been happily monitoring our servers for a few weeks and have gathered a decent amount of data around what is “normal”.  Rather than having to actually go and look and check what is going on, it is now time to set up the appropriate Alarms and Alerts for our Server.  This is the next logical step in reducing the amount of time and effort it takes to monitor our Servers.

If we look back to what we monitor then the alarming should be pretty obvious for some things.  We will start with them.

Server availability  – If a server is down, we want to know about it!  This is probably the most important of our basic Alarms.  We want to know it is down before any of our clients lodge a support request, and nearly always do.

Disk Usage – It is pretty obvious that we would want to know when a disk or partition is filling up.  So, once we know how much disk is used, we add alerts so we are notified if this happens.  We usually select an alert level that represents a 10% increase in disk usage, and an Alarm Level that is around the 90% mark of the space.

Process count – we want to know if there is a sudden increase or decrease in the number of processes running.

Alerting Methods

There are almost as many methods of sending an alert as there are of creating it.

The most common Alerting methods are

  • Visual
  • Audible
  • Email
  • SMS

The Visual (and Audible) alert is often best used where you have several staff in a Network Operations center.  Having the Status screen available for everyone to see, often accompanied by a sound can be used.  You could also have a window on a single computer screen doing the same thing.

E-mail is another way and often the preferred method as it can be sent to more than 1 person.  The way most people do this is to set up an e0mail address specifically for alarming.  This is in fact our main method as we set up the alarms to go to all staff with e-mail addresses on all the devices that the staff member uses including Desktop PC, notebook, IPad and Smartphones.

Sending an SMS message is also useful.  The main downside is that it can be complex to set up and if you have a lot of alarms, quite expensive.

You are not limited to a single method and you will usually have more than 1 and alert more than 1 person.  After all, if an event is such that you need to know about it, you want to make sure that you get notified.