One of the most important and most overlooked/ignored steps of writing software is the Analysis and Design step. Analysis and Design is the key first stage to any new software development, and doing it properly isn't quite stressed enough in schools or in tutorials/guides. You need a clear defined idea of what the requirements for functionality are, and how best to implement them. The latter point seems to be altogether too often overlooked or ignored. This understanding is obtained from a variety of sources. The requirements come from the users and/or management, research will yield algorithmic and design enhancements (what follows), and implementation tips can come from using Pseudo-code. Pseudo-code is simply non-functioning code written in any language you like (Plain English, C, etc). The purpose of Pseudo-code is to construct the application on a logical basis, not a programmatic one. This helps when you refer to the design documents later on. I'd encourage the use of UML (http://www.uml.org/) which was designed specifically with modelling applications and procedures in mind.

It is my endeavour to provide a tangible illustration of the benefits of proper Analysis and Design, in the hopes that it will at least educate some as to the benefits it can yield. To illustrate my point, I will use the example of a program which factors numbers to determine if they are prime. For non-prime numbers, it returns all factors of the provided number. Additionally, I will be timing it using the linux "time" command, and taking the "real" value, to show exactly how much performance can be had from seemingly minor chances. In this code, there are three steps of optimization: the initial step, Step A, and Step B. Follow the directions in the Core Block bit of code pertaining commenting/uncommenting specific lines for each of the steps.

Step A basically is a very trivial and simple mathematical principle applied to speed things up here. The highest non-prime factor of a given number must be n/2 due to the fact that 2 is the lowest non-prime factor. The net effect this has is that it cuts the number of entries to look through in half.

Step B is taking Step A further, if we are able to find a given factor, we know its result must also be a factor of the given number, therefore we save processing time by taking both factors at once, and reducing the max value so we don't have to go over the same number twice. Algebraically it might look like: n / a = b, results taken only when n % a = 0.

The following code is the application in question:
Code:
#include <iostream>
#include <cstdlib>

using namespace std;

#define DEFAULT_FACTOR_COUNT 300

int main(int argc, char *argv[])
{
  int num = 0;  // Number entered
  int max = 0;  // Maximum loop value
  int i = 2;    // Loop incrementor, initialized to 2
  int fcount = 0;  // Factor count
  int factors[DEFAULT_FACTOR_COUNT];  // Factors array

  // Initialize the factors array
  for (i=0; i<DEFAULT_FACTOR_COUNT; i++) {
    factors[i] = 0;
  }

  /*
   * Determine if a number was passed on the commandline. If it was, we skip
   * asking for a number.
   */
  if (argc > 1) {
    num = atoi(argv[1]); // set it to the first argument.
  } else {
    cout << "Please enter a number: ";
    cin >> num;
  }

  /**** CORE BLOCK ****\  
  //max = num / 2 + 1;   // Uncomment this for Step A.
  max = num;             // Comment this for Step A.
  
  
  /*
   *  We loop from 2 through n/2. This is because 2 is the smallest non-prime
   *  factor, and n/2 is the largest non-prime factor. We will decrement max to
   *  the next lowest factor on each successful factor find.
   */
  for (i=2; i<max; i++) {
    // If there is no remainder after the division, we add a factor.
    if (num % i == 0) {
      factors[fcount++] = i;
      //max = num / i;            // Uncomment this out for Step B.
      //factors[fcount++] = max;  // Uncomment this out for step B.
    }
  }

  /**** END CORE BLOCK ****\
  
  /*
   * First we save some work by determining if we are dealing with a prime number.
   * If we are, we simply output that the number is prime and exit.
   */
  if (fcount == 0) {
    cout << "Number is prime." << endl;
    return EXIT_SUCCESS;
  }
  
  /*
   * Now we print out the list of factors we achieved, starting with 1 and n.
   * Add some fancy formatting to it as well.
   */
   cout << "Found " << fcount << " factors of " << num << "." << endl;
   cout << "Factors: " << endl << "  ";
  for (i=0; i<fcount; i++) {
    if (i > 1 && i % 15 == 0){
      cout << endl << "  ";
    }
    cout << " " << factors[i];
  }
  cout << endl;
  return EXIT_SUCCESS;
}
Now to look at some of our values. For the initial run, I have run a simple bash script, that runs a set of 15 numbers, as well as running a number individually.
The script is as follows:
Code:
#!/bin/bash
echo -ne "Number 1: "; /usr/bin/time -f %E -a ./algo 8035737 > /dev/null
echo -ne "Number 2: "; /usr/bin/time -f %E -a ./algo 1089053 > /dev/null
echo -ne "Number 3: "; /usr/bin/time -f %E -a ./algo 28508191 > /dev/null
echo -ne "Number 4: "; /usr/bin/time -f %E -a ./algo 1009095 > /dev/null
echo -ne "Number 5: "; /usr/bin/time -f %E -a ./algo 98031 > /dev/null
echo -ne "Number 6: "; /usr/bin/time -f %E -a ./algo 7641367 > /dev/null
echo -ne "Number 7: "; /usr/bin/time -f %E -a ./algo 418763 > /dev/null
echo -ne "Number 8: "; /usr/bin/time -f %E -a ./algo 7689231 > /dev/null
echo -ne "Number 9: "; /usr/bin/time -f %E -a ./algo 5000101 > /dev/null
echo -ne "Number 10: "; /usr/bin/time -f %E -a ./algo 5974125 > /dev/null
echo -ne "Number 11: "; /usr/bin/time -f %E -a ./algo 51320898 > /dev/null
echo -ne "Number 12: "; /usr/bin/time -f %E -a ./algo 1001 > /dev/null
echo -ne "Number 13: "; /usr/bin/time -f %E -a ./algo 2103559887 > /dev/null
echo -ne "Number 14: "; /usr/bin/time -f %E -a ./algo 1025488 > /dev/null
echo -ne "Number 15: "; /usr/bin/time -f %E -a ./algo 12783 > /dev/null
The actual program output during the run:
Code:
Number 1: 
   Found 6 factors of 8035737.
   Factors: 3 2678579 163 49299 489 16433
Number 2: 
   Found 2 factors of 1089053.
   Factors: 7 155579
Number 3: Number is prime.
Number 4: 
   Found 6 factors of 1009095.
   Factors: 3 336365 5 201819 15 67273
Number 5: 
   Found 6 factors of 98031.
   Factors: 3 32677 41 2391 123 797
Number 6: Number is prime.
Number 7: Number is prime.
Number 8: 
   Found 22 factors of 7689231.
   Factors: 3 2563077 9 854359 11 699021 33 233007 99 77669 101 76131 303 25377 769
      9999 909 8459 1111 6921 2307 3333
Number 9: Number is prime.
Number 10: 
   Found 30 factors of 5974125.
   Factors: 3 1991375 5 1194825 15 398275 25 238965 75 79655 89 67125 125 47793 179
      33375 267 22375 375 15931 445 13425 537 11125 895 6675 1335 4475 2225 2685
Number 11: 
   Found 62 factors of 51320898.
   Factors: 2 25660449 3 17106966 6 8553483 9 5702322 18 2851161 27 1900774 47 1091934 
      54 950387 73 703026 94 545967 141 363978 146 351513 219 234342 277 185274 282 
      181989 423 121326 438 117171 554 92637 657 78114 831 61758 846 60663 1269 40442 
      1314 39057 1662 30879 1971 26038 2493 20586 2538 20221 3431 14958 3942 13019 4986 
      10293 6862 7479
Number 12: 
   Found 6 factors of 1001.
   Factors: 7 143 11 91 13 77
Number 13: 
   Found 14 factors of 2103559887.
   Factors: 3 701186629 11 191232717 13 161812299 33 63744239 39 53937433 143 14710209 
      429 4903403
Number 14: 
   Found 18 factors of 1025488.
   Factors: 2 512744 4 256372 8 128186 16 64093 107 9584 214 4792 428 2396 599 1712 
      856 1198
Number 15: 
   Found 2 factors of 12783.
   Factors: 3 4261
So the initial setup's output is:
Code:
Number 1: 0:00.38
Number 2: 0:00.05
Number 3: 0:01.33
Number 4: 0:00.05
Number 5: 0:00.00
Number 6: 0:00.36
Number 7: 0:00.02
Number 8: 0:00.40
Number 9: 0:00.25
Number 10: 0:00.27
Number 11: 0:02.40
Number 12: 0:00.00
Number 13: 1:41.72
Number 14: 0:00.06
Number 15: 0:00.00
Not bad, almost everything is under 5 seconds with the exception of run 13 which is the only 10 digit number in our run. This could be a good example of how software performs. Almost all the time here, we are seeing a pretty quick piece of software, however there is an exception, and this exception is pretty large. Almost two full minutes to get the primes for number 13. Our total execution time is: 107.29 seconds (1:47.29).

Now, applying the changes in Step A (diff. b/w Initial & Step A):
Code:
Number 1: 0:00.19   (-0.19)
Number 2: 0:00.03   (-0.02)
Number 3: 0:00.63   (-0.70)
Number 4: 0:00.01   (-0.04)
Number 5: 0:00.00   
Number 6: 0:00.15   (-0.21)
Number 7: 0:00.00   (-0.02)
Number 8: 0:00.18   (-0.22)
Number 9: 0:00.11   (-0.14)
Number 10: 0:00.13  (-0.14)
Number 11: 0:01.17  (-1.23)
Number 12: 0:00.00  
Number 13: 0:47.68  (-54.04)
Number 14: 0:00.01  (-0.05)
Number 15: 0:00.00
Now this optimization knocks a solid 0:57.00 off our execution time, bringing the total down to: 50.29 seconds (0:50.29). The bulk of this ground is made up on the longest execution time, however it does make some significant headway.

Applying changes in Step B:
Code:
Number 1: 0:00.00   (-0.19)  [-0.38]
Number 2: 0:00.01   (-0.02)  [-0.05]
Number 3: 0:00.63            [-0.70]
Number 4: 0:00.00   (-0.01)  [-0.05]
Number 5: 0:00.00     
Number 6: 0:00.15            [-0.21]
Number 7: 0:00.00            [-0.02]
Number 8: 0:00.00   (-0.18)  [-0.40]
Number 9: 0:00.11            [-0.14]
Number 10: 0:00.00  (-0.13)  [-0.27]
Number 11: 0:00.00  (-1.17)  [-2.40]
Number 12: 0:00.00  
Number 13: 0:00.21  (-47.47) [-1:41.51]
Number 14: 0:00.00  (-0.01)  [-0.06]
Number 15: 0:00.00
(diff. b/w Step B & Step A)
[diff. b/w Step B & Initial)
This optimization as well knocks a further 0:49.18 off our execution time. The final total execution time after both sets of optimizations is 0:01.11. This represents a speed increase of over 9600% ; it takes less than 1/96th the time! All that extra speed from some minor changes that are only possible once you understand the logic of what the application should do. Granted, not every application is going to have such drastic speed increases just from a better understanding of how they should work. Consider however that improvements in speed are only one facet of improvement you will see by understanding and designing correctly from the get-go. I used speed in this illustration merely because it is the simplest to measure. More efficient design leads to improvement in all areas, and can contribute greatly to the security of the application (consider race conditions, for example), in addition to stability.

For more info, check out the Unified Modelling Language homepage.