June 26th, 2002, 10:19 PM
Overview of disaster recovery
Not really a tut but it gives a good overview. Enjoy.
Has your computer ever crashed and you realized you forgot to save your work? Itís happened to everyone
at one time or another, and the situation is frustrating and inconvenient at best. But what happens when
your network or mission critical server goes down? In a production environment, company time and
resources are wasted during downtime. The consequences of an outage are magnified if the system
operates in real time, such as a Web server or inventory management system. The loss of such an integral
part of operations could cripple a business and have a negative impact on customer confidence. The
solution is to implement failover mechanisms using redundancy, load sharing, and traffic engineering to
provide continuous network services to users.
Planning ahead is an important step for ensuring that network outages are remedied quickly. A system
failure plan should be in place for personnel to follow in the event that one of the network servers goes down
or the network fails. The recovery plan can be as simple as maintaining a list of phone numbers to call
when a problem occurs and specifying roles for personnel. These measures reduce confusion and make
reaction time quicker, reducing the duration of downtime. In addition to human planning there are several
techniques that can be applied to ensure network and server uptime, as detailed below.
The addition of redundant components can help to ensure network survivability. For instance, additional
power supplies and management modules will keep an enterprise-class core router from shutting down in
the event of a power supply or management module failure. When the primary unit fails the redundant unit
will take over without interruption of service. Another measure for increasing redundancy is using server
clusters. A server cluster has several redundant nodes distributed across the network; if one fails the others
can take over.
Redundant network design keeps the majority of a network operational when a catastrophic failure occurs.
The design distributes services such as routing and authentication to several different parts of the network.
If a core router fails, the other segments of the network can take over and use redundant routing paths to
make sure traffic is still flowing.
A large enterprise network provides many services to internal and external customers and personnel. Many
times a single computer server cannot supply enough information to large numbers of users efficiently, but
using several servers working together will meet the demand. This is called load sharing.
Load sharing uses several servers working together as a single virtual server to divide requests among all
the nodes such that no single server gets overloaded. Load sharing has been very useful for meeting Web
server demands as well as real-time needs such as customer relationship management (CRM) and
real-time inventory systems.
Network and server failure doesnít always happen due to a mechanical or software malfunction Ė
sometimes failures result from the actions of malicious external sources. Threats such as denial of service
attacks, viruses, worms, and Trojan horses can all bring a network or a server to its knees. Steps must be
taken to keep hackers out of the network through authentication, encryption, and firewall technology.
In addition, a network must have the ability to deny erroneous traffic at the edge of the network so attacks
such as denial of service will not reach the core of the network. Lastly, internal and external traffic must be
monitored for suspicious activity, and a plan for action must be in place to respond to an attack once it is
The ability to keep a network and its servers operational is a primary concern and goal in any enterprise
business. A solid network and recovery plan are very important, as are integrated safeguards to ensure the
uptime and survivability of the network. Choosing the proper equipment, implementing failover mechanisms,
and making good decisions in network design will save time and money, enabling enterprises to provide
continuous network operation and mission-critical services to users and customers.
America - Land of the free, home of the brave.
June 26th, 2002, 11:05 PM
Also consider DR Drills where a simulated Disaster occurs. This provides you with an expected time to recovery. If a real disaster occurs (God forbid) is the recovery time short enough or are there areas that need additional tuning?
If you spend more on coffee than on IT security, you will be hacked. What\'s more, you deserve to be hacked.
-- former White House cybersecurity adviser Richard Clarke
June 26th, 2002, 11:15 PM