I will let you into a secret, the Reliability and Performance Monitor is a
techie's dream. This GUI is pure fun to explore, you are certain to find
something new and interesting.
Review of Windows Reliability and Performance Monitor
The Resource Monitor will give you instant appreciation of which processes
are hogging the CPU, Memory, Disk and Network. When you need a longer
history of how processes and applications consume
server resources, then investigate the Data Collector Sets. Begin by using
the preset templates, then experiment by modifying the templates and saving your own
settings.
If you need to justify the time in 'playing' with this tool; I suggest that
the biggest advantage of learning about the Reliability and Performance Monitor is that
you will have a sound grasp of the basics when it comes to solving a Windows
Server 2008 problem.
My favourite way to launch the GUI is to click on the Start (button), type 'perfmon' in the
'Start Search' box. When the application launches, I look at the top left of the screen and
click on the Reliability and Performance Monitor. I notice my
friend 'Barking Eddie' uses a different route, he clicks Start, then selects
Administrative Tools.
Another management tactic that I like is creating an MMC, the
benefits are that the console remembers your settings, and you can
create a whole family of related snap-ins. If you like this
technique then inside the Reliability and Performance Monitor go to the File menu and choose 'Save as'.
Once you start exploring the GUI, you soon appreciate that the
Reliability and Performance Monitor is
really 3 utilities in one; the only minor trap is that when you want to
work with the Resource Overview you must select 'Reliability and Performance' in the
left-hand tree.
In a nutshell, this is monitor is intuitive 'Windows' at its best; just keep
clicking the window pane to see more detail. At last, an
application that lives up to the hype of 'Easy to use'. One
tip, look out for the up and down arrows circled in the screenshot
below.
When you want to trace the root cause of a computer running slowly it helps if you ask
yourself, and the server, a series of branching questions. Start with: 'Is
this a hardware or software problem?' Then follow up with
questions such as, 'Can we rule out a disk
failure, a loose SIMM chip or a broken network connection?'
OK, so it's not a hardware problem. What you need now is a pincer movement.
To locate the problem ask yourself these two related questions: 'Which resource is the bottleneck?' And,
'Which program is consuming most of the resource?' To help
identify the program consuming most of the resource I click on the up
arrow on the appropriate column, for example, Average CPU, or Commit(KB)
in the Memory window.
Here are detailed instruction on how to get started with
Performance Monitor
Begin by clicking high, high, high on precisely
Reliability Monitor. (See blue band above).
Develop a theory - which PROCESS is responsible. For a
quick test, in turn, sort each column with numeric data. If you are
wrong, then what's your next guess?
Memory is the most likely bottleneck. Check that % Used
Physical Memory is less than 90%.
Disk problems could indicate that a disk is about to fail.
Sort on both 'Read' and 'Write'.
CPU bottlenecks, % continuously over 80%, may indicate this is
old server due for replacement.
Network bottlenecks may need a third-party utility to confirm.
Performance Monitor underplays network problems because it only
shows you data from the one machine and not the whole network.
Brutal Advice for Newbies
If a quick check of the Resource Overview does not reveal the source
of your problem, AND YOU ARE NEWBIE, then I am sorry but I must be
brutal - there is no easy
solution. My best advice is call for an expert, it will be cheaper
and less frustrating in the long-run. OK, so you have no
money, but you are willing to spend time trying to isolate the problem.
Internet research is likely to reveal lots people sitting on the fence,
the problem is that every setup is different, it's only experience that
tells you if latency is due to an anti-virus program
trying to update their image files, and not a SQL database that is suddenly
having to service more user queries.
Guy Recommends: The Orion Network Performance Monitor (NPM) 9.5
Orion's performance monitor is designed for detecting network outages.
This NPM will guide you
through troubleshooting by indicating whether the root cause is a broken link,
faulty equipment or resource overload. Because it produces
network-centric views, it is intuitive to navigate, and as result you can
see easily what's working and what's not.
Perhaps Orion's best feature is the way it suggests solutions. Moreover, if
problems arise out of the blue, then you can configure Orion NPM 9.5 to notify
members of your team what's changed and how to fix it.
Your old perfmon is still there, but it's surrounded by useful
gadgets for general or level-one troubleshooting, for example, Data
Collector Sets. There are 3
tools which make up the Windows Reliability And Performance Monitor:
Resource View, Reliability Monitor, and Performance Monitor (perfmon). The result is a
broader utility which really is more friendly for solving common
problems. Also the System Stability Chart helps you to investigate
minor faults before they escalate into a full-scale crisis.
Performance Monitor Logs
The key to understanding the logs is to realize that when you launch
the Reliability and Performance Monitor it is set to record live data.
To CREATE logs head for the Data Collector Sets. To replay logs
that made earlier then click on the 'View Log Data' icon. See the
two screenshots
below.
Data Collector Sets and Templates
If the overview in the Reliability and Performance Monitor does not
reveal the root cause the problem, then maybe logging would help.
The Data Collector Sets make stage two of performance monitoring easier,
look out for the useful templates with predefined settings, which you can
use to generate performance logs.
The difficulty is that you could be drowned with data. Through
guidance from the Data Collector Sets folder you may be able to record
some meaningful data. However, my guess is that it will take 3
goes to collect data that confirms or denies your theory as to the root cause
of your problem. Whereas the Monitor, really does live up to the hype of easy
to use. All I will say about the Data Collector Templates is that
they are EASIER to capture meaningful data than the old perfmon.
Case Studies for Performance Monitoring
Many people believe that you are not supposed to have fun while you
learn, but to me, having fun is the ONLY way I learn. Thus making
a game of finding the bottleneck is one of the most effective ways of
improving a server's performance. Here are possible sources of
bottlenecks which cause your server to respond slower than you would
expect from its specification.
A program monopolizes a particular resource. E.g. Malware
hogs the CPU.
Insufficient resources are available. E.g. a database
consumes all the memory.
A program, device, or service fails. E.g. Terminal service
times-out.
Software is incorrectly installed or configured. E.g.
missing, or wrong version of DLL.
The system is incorrectly configured for the workload.
E.g. not enough memory.
Solutions to Performance Problems
It helps problem solving if you review the range of solution at
regular stages. At one extreme you could buy new faster hardware,
or complete system. Yet at the other extreme, a cost nothing solution
such as load balancing could
fix the problem at least temporarily.
If you say to yourself, OK, I'll splash out and by a new system.
Your brain may say, 'but there is nothing wrong with the disk
subsystem', followed by, 'Hang on, those quad-processors are already
overkill'. Then a better solution emerges, 'How about if I just
add more memory?'.
However, an alternative thought process could be, 'Let's just by new memory,
hmmm... than means that single processor will be the bottleneck.
And 80GB disks looked big 5 years ago, but they are full and these days
we can get much bigger disks for the price we paid. Also, isn't
that fan a bit noisy? Conclusion, why don't we buy a balanced
system with new components.'
Specific Advice
Add more resource, e.g. add an extra memory SIMM / DIMM.
Load-balancing, move some of the processes, or users files, to another disk
or even to another server.
Upgrade, buy a faster disk system.
Baselines, creating baselines separates the professionals from
the amateurs. It's just so much easier to solve a problem that
arose this week if you know what 'Normal' looked like. When
you have baselines from last week, last month and last weekend, then
the patterns provide the clues for a speedy resolution.
Summary of Using Performance Monitor
Nothing changes with performance monitoring. The secret of success remains the same. Concentrate on
which processes are using the big 4 resources:
memory, cpu, disk and network. Look for 'cost nothing' solutions, e.g.
move the paging file to another disk, perform regular defrags. Employ the Windows System
Resource Manger to meter out resources, and put limits on any users or
applications that are bullying the CPU. If all else fails ask, 'will
installing more RAM speed up a lethargic server?'
Vista was the first place that I saw this reliability monitor, at the time I
thought, 'This tool would be even more useful on my servers'. Well here it
as a snap-in for Windows Server 2008.
The Reliability Monitor is an intelligent filter that trawls
the Event Logs and pulls out significant events which it then displays a
chart. Microsoft describe the Reliability Monitor as an intelligent agent
for Performance Monitor's Alerts.
The Reliability snap-in records system stability as a mark out of 10.
Look behind the bare SSI (System Stability Index) number, and research events on
the chart to see when software changed, or the services froze. Observe
that the main chart has data lines which record Application, Hardware, Windows
(Operating System) failures. In particular the Reliability Monitor places red crosses on dates when failures
occurred. For any given event, note the detailed description underneath
the chart.
Servers can often continue working, albeit slower, even thought there are
errors, what this monitor does is show you significant event so that you can
decide what corrective action to take. Replace hardware that's on the
blink, research better drivers, or even consider managing CPU usage with a
separate snap-in called the System Resource Monitor.
What you also get is troubleshooters to identify here and now
what is preventing the server operating as designed, for example network
unavailable, dodgy drivers, loose memory chip.
When calculating the SSI, recent failures are weighted more heavily than past
failures, thus once you resolve a problem you should soon see the index rise.
Tip: To review all your available data, click on the drop-down date menu and
select: 'Select all'.
Remember the other servers in your organization. As with previous
versions of Perfmon, you can collect data from other servers, for example, right
click on 'Reliability and Performance' and select: Connect to to another
machine. You really need to select a Windows Server 2008, or a Vista
machine, because XP and Windows Server 2003 don't have the correct agents.
Summary
If you are prepared to put in the time, then using the Reliability and
Performance Monitor will reward you with detailed understanding of your Windows
Server 2008. Your explorations will be a labour of love, and the
justification of investing time is that you will have the experience and a
base-line to make future troubleshooting more successful.
Train Signal has
now released their
Windows Server 2008 Training Course. As an MCT
trainer, I am a huge advocate of Train Signals products. What particularly
impresses is me is the demonstrations. If
you are looking for a complete DETAILED coverage of Windows Server 2008, then I highly recommend that you give this course a try. I have reviewed their
6 hours plus of videos myself, and I guarantee that you will
not be disappointed!