Day four here in this classs - learning about what the Microsoft ASP.NET 4.0 toolset can do. Today we learned about the Microsoft Windows Communication Foundation Services can do with Web Services. We are learning about SOAP, debugging your code. Many awesome things.
Yesterday I spoke about the proactive or preventative type SNMP alerts you can use for monitoring thresholds and states. Today I want to discuss the 'oh no! its down' type events. In the modern world servers, desktops, printers, router etc. will fail suddenly with out any warning. A memory corruption error, a piece of hardware, there are many different reasons for sudden outages. A communication link outside your network may suddenly drop. How do you manage these events and more importantly, prevent them in future. Those in higher levels from you do not like to hear about 'repeated issues'. The first time an 'incident' occurs it can be understood and explained. But if the same event happens again, you be looked as being responsible for avoiding it. Let's look at a server failure. Your Exchange Server 2008 for some 'unknown' reason has stopped sending or receiving email. The first question to be asked is who knew about it first, you or did the users start calling you. In this scenario hopefully you had a threshold set to monitor the number of emails in your servers 'send' queue. If the send queue exceeded that limit did you get notified.
In this scenario, you as the network administrator have had an 'event'. If your SNMP Manager is watching your messaging queues, you get notified, troubleshoot. A reboot of the server maybe required. After the event is over you look over the log files for issues and problems. If no solution is found possibly a support call to Microsoft could be used to assist with your diagnosis and problem resolution. I can safely say that the Microsoft Engineeers who support the different technologies there are very knowledgeable and will work to solve your issue.
Let's say in this scenario that you did not have the alert set to monitor the problem. Users call you and tell you that the message they sent two hours ago to Mr. Smith did not arrive. They called and checked. You investigate the problem, correct the issue and also diagnose and solve the issue.
The difference between these events is significant. If you discover the problem, tell your users that there is a problem and a server reboot is necessary, you look very smart. If the users call you and tell you there is a problem then you have to restart the server, it could have a negative impact on how you are viewed.
So in the case where you had a problem with your mail server and fixed the issue, you main goal is to setup and alert to monitor this behavior. If the problem arises again you will be notified of the issue WHEN it happens and work to fix it. Issues and problem will occur in your environment. How you prevent them from happening in the future is really up to you.
SOAP, darn, I coulda used that knowledge 7 weeks ago, as the app I was updating used the technology they stopped supporting in 2005, and I could not find any good description of the replacement other than being directed to .NET
ReplyDeleteHope classes are still good.