just out of curiosity, how do admins load microsoft critical patches for mission critical servers that can't afford any downtime? Is there a way to load the patches without a reboot? just curious. your input is appreciated.
Printable View
just out of curiosity, how do admins load microsoft critical patches for mission critical servers that can't afford any downtime? Is there a way to load the patches without a reboot? just curious. your input is appreciated.
well...mostly..no...most critical patches I have seen require a reboot....cause its ties to the protocol...or some system file that needs to be reloaded.
Some patches you can just restart the services........but you REALLY have to know how all those services relate to eachother......and startup order.....probably easier to reboot...
I ALWAYS have a verified backup...before applying the patch...and usually do it on the weekend...because of downtime....and applying patches while the server is in use can slow it down to a crawl....or out right fail....or cause other issues.....also using the weekend...allows me extra time to fix it....before monday...and the big guys come in....and start whinning about how they cant get their email... :rolleyes:
I guess you would have to weigh the risk of applying the patch compared to the downtime..
I usually send out an email to warn users
that network services will be unavailable due to critical patches....always give a larger window on the downtime...
That way you are a hero when it comes back up in less time then you stated ;)
MLF
Basically you do a planned downtime on the server. that is you tell everyone that from about midnight till 1 for example the server will be down. In a lot of cases a mission critical server will be redundant so that if one is down the other can take the load.
A word of warning you can apply patches with out rebooting the server but as Morgan said you have restart all the services and processes to be sure the patch is applied. I would also insist that you reboot the server before applying the patch especially on a mission critical machine which is nearly never rebooted. That way if it doesn't boot then at least you know it is not because of your patch. I've had cases where people patched servers without doing a boot either before or after applying the patch. Then wasting hours a month later when they tried to boot and the server didn't boot which would have been fine only i used get called at all hours of the night for problems like that.
wow interesting info. I was just curious if i was doing the patches correct just like any other microsoft admin. i just hate coming in the weekends to load patches to 25 different servers, so much (boring) time invested. How do you guys load patches for servers? manual downloads, Windows Update, or some other push down software? What is your preference. I currently do the Windows Update. Thanks for the quick responses.
most of my servers just like all the workstations get their updates from SUS which will become WUS next week. this way i can wait to be sure none will cause me problems by reading all the security boards before i publish them for download.
using gpedit.msc from the run command permits you to install the updates pretty much as you'd like to.(see attached jpg) heck sus/wus is free and has already saved my ass.
I'm no longer a windows admin now but we used a SUS solution also. We had about 700 servers to update so doing it manually wasn't possable.
Our highly available applications are clustered. So when we have to reboot a server for maintenance, we are only look at a minute to two minutes of downtime for the failover. That is for exchange. Other applications that we have running on clustering can fail over in less than 30 seconds. Which is generally not enough downtime to be noticed by users.
We install in our lab first, let it sit in the lab for two or three days while we are testing functionality.. Once the lab test is ok we deploy to lower priority servers the first night, and then over the course of two or three nights deploy to the rest of our servers. We never slam a patch on all servers in one night.
It should be noted that we choose to use workarounds or other means of defeating the vulnerability as opposed to always installing the patches. I prefer to wait for roll-up patches or service packs.. For instance, we never install IE hotfixes as we never use IE on the servers, and only administrators can log in.
I use a WUS servers and depoly to "test" groups first. If there are no problems, then I deploy to the rest of the machines. My groups are broken up by department and operating system.
Tedob1: Upgrading from SUS to WUS is pretty painless. Just make sure you have all the minimum requirements installed and you'll be good to go. The benefits are plenty and you'll like WUS way more than SUS.
To be honest, only a small minority of patches need to be applied to most servers.
For example - there's no real need to patch IE on a file server because you're not going to be surfing for p0rn on it. Similarly, issues with Word, PowerPoint, Windows Media Player etc etc aren't critical.
For most other patches, have a look at the workarounds and mitigating factors. Often these are BETTER solutions than patching and won't require a reboot. For example, if the flaw is in a service that you don't actually use then you could just disable that service altogether.
Yes, you probably should apply these patches to your servers sometime, but in most cases you don't actually need them or can work around the problem. You can patch them at a later date when there's something that you really DO have to patch the servers for.
If you run a serious business and want a 99.99% availibilty and no downtime {caused by batches unless it is a core batch} run Linux .... :)
That depends... If you want 99.999% uptime of the server or of the service.. If you want 99.999% uptime of the service, a cluster will allow the service to continue when you're rebooting/patching a server...Quote:
Originally posted here by Black Cluster
If you run a serious business and want a 99.99% availibilty and no downtime {caused by batches unless it is a core batch} run Linux .... :)
How do you get 99.99% uptime and no downtime on Linux if you need to install a new kernel?
Easy. If you know what you're doing you can go without a reboot even for a Kernel. You just have to reset a few things. Bt other than this... Why would you NEED a new Kernel? Just because a new one comes out doesn't mean you need it. On a Server in a company, the Server would be configired in a proper way anyway.Quote:
Originally posted here by SirDice
How do you get 99.99% uptime and no downtime on Linux if you need to install a new kernel?
And of course, heh, you could strip Linux down to almost nothing, hack the Web Server right into the Kernel, and tell it to discard anything else. I know Porn sites that do this and have never once been broken into. And never updated.
So that's how ;)
I think the Linux From Scratch people have something on their site about doing things like this.Quote:
And of course, heh, you could strip Linux down to almost nothing, hack the Web Server right into the Kernel,
The only reason you have to upgrade a kernel is security updates isn't it? Or can you apply a security patch to a running kernel without rebooting?
In theory, Linux has a way to do that yes. I personally don't. I usually apply updates in Kernels and reboot. But if it was a server that had to be up non stop, I'd just not install it. Really most Kernel updates for security aren't going to sting you if you've properly set up things.
Here I think the thread is about a company and servers, meaning they probably have not only a good firewall, but also the company was most likely set up in a proper manner. A firewall sitting in front of the server, and IPTables being told "Only allow this" and the system not having a bunch of crap you don't need, greatly reduces the chances you're going to have a problem.
I've tested this before. I took a SUSE installation and customized it to where the only things really installed were what was needed on an FTP server.
When I was done and checked for updates, I think I had 3 total. SUSEFirewall works very well. Someone has to be able to get in through an open port, and depending what the server is doing, you don't need to allow that.
I had FTP open. That doesn't mean someone could just FTP in, I looked over my logs and people were trying all the time to root it, none did.
Some Kernel updates I've seen are local priv escalation. So someone has to be local to do it.
Also, SUSE does code audits like OpenBSD does which is why you don't see a new update for SUSE every time RedHat or Fedora release one.
If you apply a kernal patch and don't reboot, IMHO you run higher risk of system instability.
And wouldn't we call an unstable system vulnerable?
How do you risk unstable when the new Kernel doesn't get used unless you reboot? When you install it and don't reboot, you simply aren't using it.
Lol, missed that part. Na, I wouldn't. I've never seen a hacker break into a box that has crashed and is no longer responding. ;)Quote:
Quote:
Originally posted here by Black Cluster
If you run a serious business and want a 99.99% availibilty and no downtime {caused by batches unless it is a core batch} run Linux .... :)
I don't know of a single instance where someone is running a 99.999%(actually it is 99.9999%) available system, what we call 6 sigma, on a linux system. And I work with a whole lot of IBM people.
Those systems are almost exclusively run on MVS or solaris. The most critical system I can think of in terms of always up is the New York Stock Exchange.. And they are running Oracle on Solaris. Even then.. An oracle grid still takes time to failover, and they have maintenance windows...
And even with 99.9999% available systems, you can have 3.4 minutes of downtime per 1,000,000 minutes. Which works out to two minutes or so a year... 99.99% allows for 100 minutes of downtime per 1,000,000. Which is 50 minutes per year.
But even then.. I've never come across a system that didn't factor in some sort of maintenance window. Most people you hear talking about running at 99.99 or whatever only count available minutes. Meaning that they might get 30 minutes a month for maintenance activities, and those 30 minutes don't take away from their availability metrics.
I thought 99.999 was 5 minutes per year.
Gore that five minutes a year is for unplanned downtime. It doesn't count planned mainteance. But haveing said that i did have windows servers that were up close to that level. Even had a couple of NT4 domain controllers that were up around the 95% mark if you will believe that. If you really need that type of uptime then you go with systems made for it. We used IBM Iseries or as they were called AS400. Now thats a robust system when, like linux it is properly managed.
How are you verifying your backups?Quote:
I ALWAYS have a verified backup...
I restore selected files to another location...confirming I can read and restore from the media.Quote:
How are you verifying your backups?
and that it has not somehow become damaged and\or corrupted.
MLF
Quote:
Sorry gore.. You were right.. I was thinking 99.9999%, six nines, means that you have 3.4 defects per 1million opportunities. This is what is considered the highest level of "perfection." This is what six sigma tries to achieve through process improvement.
So if you are looking at availability each minute represents one possibility for a defect. Meaning that if you are down for 10 minutes, you just had ten defects.
60minutesX24hoursX365days=525600 minutes in a year.
So if you are going to strive for 3.4 defects per 1 million opportunites you can have approximately 1.7minutes of downtime per year to achieve 99.9999% availability.
My exchange servers are currently running at 99.98%. Our target is 250dpm(defects per million). We have been hitting 200DPM for the last three years. Which gives us a sigma level of 5.04.
However, we don't calculate our availability based on system uptime. We base it on available user minutes. So we take the total number of users on each system averaged out for each month and then multiply that by the number of minutes in that month. The way that we calculate our impact to availability is to multiple the number of users on that system the day of the outage by the total length of the outage.
So, if we have 4000 users on a system and a 20 minutes outage we multiple 4000x20=80000 IUMs(impacted user minutes). If the month has 31 days that would be (60x24x31=44640 available minutes).
Multiple the available system minutes by users to get 44640x4000=178,560,000 user minutes per month which can also be called opportunites for defect. If that is the only outage for that month it works out to 99.96% availability or 4.82 sigma.
But we also get a short window each month for maintenance, and using clustering, we never come close to exceeding out maintenance window.
Ahh, yea I was sitting here reading that like "Ok, I thought for sure it was 5 minutes because that was pounded into me in Security +" but I ask anyway to keep it open heh.
You seem to work in a fairly high end place huh?
Yeah.. All of my experience for the last ten years as been in the telecomm sector, with some consulting work in the financial sector. It has it's advantages, but disadvantages as well. Most of the time really large corporations heavily segment their IT operations. For instance right now I only do exchange email and blackberry so I only get exposure to networking and AD when it impacts my servers. But luckily we do everything on our servers, hardware, security, OS, and application. And how many people can say they have an OC196 backbone between their servers..Quote:
Originally posted here by gore
Ahh, yea I was sitting here reading that like "Ok, I thought for sure it was 5 minutes because that was pounded into me in Security +" but I ask anyway to keep it open heh.
You seem to work in a fairly high end place huh?
OC196.... Good God.... Now I'm a little rusty, but that, if I am right, is around 9 GBs a second... Wow... I know an OC 256 is is 13 GBs.... Man that's awesome lol. I couldn't handle that connection. My HDs aren't fast enough and neither are the NICs.
I think you were the one who sent me a pic of one of your SUSE servers weren't you? I know you have a nice set up that's for sure.
I was refering to the posts on applying the kernal without rebooting. I wouldn't call that a stable system.Quote:
How do you risk unstable when the new Kernel doesn't get used unless you reboot? When you install it and don't reboot, you simply aren't using it.
I don't even think NORAD reached that. They do however have failover out the ass.Quote:
I don't know of a single instance where someone is running a 99.999%(actually it is 99.9999%) available system,
RC: The new Kernel wouldn't be in use. That's why on SUSE it says if you decide to use the new Kernel you have to reboot. It doesn't start use until a reboot. My mail server right now has a Kernel update installed and I haven't rebooted for it yet in 2 months. IT won't use the new one until a reboot happens. It just stores it for when you do as far as it goes.
As for NORAD....Ugh, just once, I want to see what kind of systems they have. I can't even imagine. I mean from what I've heard, I've never been there so I can't say for sure, but from what I hear they have system up and ready in case the fail over systems go too.... Mmmmmmm.
Gore- Yeah, it is a 10gb link. There are actually several of them. We are a tier1 carrier. I'd be lying if I said that all of that bandwidth was used just for us though. It is using MPLS switching so there is actually a good deal of other corporate traffic travelling over our backbone at the same time. Although it is funny when we have AD replication issues and Microsoft immediately wants to know about network saturation... Yeah, I don't think so..
We actually have billing systems on MVS that have been up for years. MVS is great like that, it is completely compartmentalized. So there is hardly ever a need to reboot the entire system. There could be individual applications running on that mainframe that haven't been up for years, but the core systems have been. And a lot of those systems haven't had software updates for years other than stuff done for Y2k.
This is hopefully not tto off topic, but what kind fo cooling do you have for a system like that? It can't possibly be running cool after years and years of being up..
And is it MVS or VMS? Kind of got confused there.
Could be a typo, but then again it could be something I haven't heard of so I have to check that.
Man, I never get to play with high end stuff like that. The only thing I have is PC hardware, and OS wise, I have Solaris, Linux, BSD, windows, DOS, BeOS....
MVS is the older name for the IBM mainframe OS. I think once it went to a 64bit architecture they started calling it z/OS... And at some point along the line it was also called OS/390. wikipedia says that the older systems are nearing the end of the supported life cycle, but I'm sure these systems are still in use just about everywhere. Cobol and JCL are actually making a comeback. A good friend of mine is a cobol programmer and I've always made fun of him for programming in such an old language.. But he is getting the laugh now because the demand for his skills is growing and so is his paycheck..
Some of the guys I work with have been supporting those systems since they came out in the 70's, so we just call any of our IBM mainframes at work MVS. There is somebody on here that supports z/OS.. Or atleast they are much more familiar with RACF and the security behind the systems than I am..
Unfortunately they no longer have a datacenter located in Orlando so i don't get to see any of the hardware anymore. But some of the systems they were using before the local datacenter closed were huge.. They literally had doors on the front of them that you could open up and walk into them.. Kind of like huge tape silo's. We also had tape silo's that look like the stuff they showed on old movies like Wargames. It always suprised me how old some of the equipment was, and that it was still up and running. We have to much of a disposable mindset now.
I'm by no means an expert on IBM mainframes but some of the larger systems would have air handlers, or have places where you could hook an air handler up to them.. I don't think they are building the really big systems anymore.. Instead they are building the midrange z servers that look more like your regular intel style server racks. IBM has a ton of information on the history of this OS and the systems-
http://www-03.ibm.com/ibm/history/ex...ame_album.html
VMS is the older DEC operating system sometimes called VAX. I think HP supports it now as openVMS. I don't have any experience with that.
Ahhhh OK. That's why I couldn't remember it. I knew what OS/390 was but didn't know it was something else... Man, I've never even seen a main frame in person let alone touched one.
The biggest boxes I've seen in person were some big Suns but that's about it, and not even close to these things you're talking about. Heh, you get paid for this lol.
wow, thanks for all the info. Very informative!