Parallel Python AKA Twin Snakes
This is a tutorial on how to achieve parallelism in Python. despite my newbie status, I have
been programming with python for over four years. I've helped program several open-source games,
programed web apps, made my own web server(only for fun), created extensive data processing
programs with Python.
It is assumed that you have read the other tutorials on Python(courtesy of HTregz), or
know the language. The importance of parallelism is unparalled (hehe, sorry for the pun)
because computers are no longer getting faster as fast. The cpu speeds are no longer doubling
every 18 months. If they were, we'd have 10 Ghz cpus right now . So right now the trend, is
towards multi-core processors. However, many application designers still have no idea how
to take advantage of parallelism.

In general there are three ways of achieving this in python. The <var>os.fork</var> method,
the, the <var>thread</var> module, and asynchronous processing.

Note: the os.fork method ONLY works on POSIX compliant platforms. So basically all versions
of linux and unix, but not windows. There is an alternative for windows,
which I've never tested called os.popen(command, [mode[bufsize]]). or
os.spawn functions. See the documentation for more details.

os.fork does what is called forking a new process. Processes get a new id, called a pid.
This is esential for proper parallelism. Here is an example:
Code:
import os, sys

def processCreator(num):
	"""Factory function for creating new processes. Creates new processes until num
	equals zero."""
	pid = os.getpid()
	print "I'm the parent. My pid is", pid
	while num != 0:
		os.fork()
		if os.getpid != pid:
			startnewfunction()
		else:
			num -= 1
			
def startnewfunction():
	print "My pid is", os.getpid()
There is a key part to this code. Its the part where it checks to see what its pid(process Id)
is. If the pid is equal to the value from before then its still in the parent process,
otherwise it is in a child process. Then in the child process it prints out its pid,
and does nothing.
The result is shown below:
>>> processCreator(5)
I'm the parent. My pid is 101
My pid is 102
My pid is 103
My pid is 104
My pid is 105
My pid is 106

Now one important thing to note is that when a process is forked, everything is copied over
to a NEW address space. This will be important once we get to threads. So the value of
pid is copied over as well. If you implement processes in a very large file, remember
that the file AND the interpreter is copied over to a new address space, so processes
handled incorrectly can cause memory bloat.

Threads are very simple but come with their own problems. Heres how you handle threads:
Code:
try:
	import thread
except ImportError:
	import dummy_thread as thread
import random
	
def threadStarter(num):
	while num != 0:
		thread.start_new_thread(randomStuff, (random.choice(range(100)), num))
		num -= 1

def randomStuff(seed, number):
	r = random.Random(seed)
	print r.uniform(0, seed)
	thread.exit()
I used the try/except wrapping because in some versions, thread is not available, so
dummy_thread is used, which does exactly the same thing. Sorta useless, but neccessary eh?
I also used the random module to do some printing of random numbers.
In the start_new_thread function, i used a tuple, as the last argument. The values in that
argument are then passed to the new function. The tuple MUST have more than one value
in it. Also when the thread is finished, you must call thread.exit() which destroys the
thread and prevents it from being run anymore. If you don't then you could end up with a
thread running as long as the process is running.

Threads do not have the equivalent of pids because they are run in the same process
space as the parent thread. This means that threads have access to the same variables as
all of the other threads. Threads can create their own per thread variables, but what
happens when a thread accesses a variable that the other threads can access? Well, quite
simply the other threads will then see a change in the variable. This method can be used
for inter-thread communication, but it also introduces a new class of errors. Normal,
sequential programming only has syntax errors and logical errors(where something doesn't
do what its supposed to). Parallel programming introduces errors like races, deadlocks,
softlocks, etc.

What are they? Well they are caused when two or more threads/processes want access to
the same resource. IE the harddrive. Since the harddrive can't simultaneuosly
satisfy both requests one must come first. But what happens if one thread requests
the harddrive, then is switched by the scheduler to another thread which needs the
harddrive. Because the first thread never finished its request, the second thread can't
get access to the harddrive. Normally this would cause an error. With proper programming,
you can get the thread to wait its turn. In the example below I present a deadlock.

Code:
import thread

numbers = 1
numbers2 = 2
global l
l = thread.allocate_lock()
one = thread.start_new_thread(funcA, (numbers, numbers2))
two = thread.start_new_thread(funcB, (numbers2, numbers))

def funcA(num, num2):
	l.acquire_lock()
	num = 5
	num2= 6
	
def funcB(num2, num):
	l.acquire_lock()
	num = 3
	num = 7
Because funcA never released the lock, funcB never gets to do what it wants. Now
if we used some sort of construct that instead restricts access to a variable...
Well there is. Actually theres three. Their use and explanation can be very complicated,
but heres a few links for more information: Mutexes
Semaphores
Queues

Python implements all of those very well. Semaphores are very old, but still very effective.
Basically, whenever a thread is able to acquire the semaphore, it can do whatever it wants
to the associated variable. By using locks/semaphores/mutexes carefully with variables
you can prevent two threads from altering a variable improperly.
Heres the proper way to handle locks:
Code:
import thread

numbers = 1
numbers2 = 2
global l
l = thread.allocate_lock()
one = thread.start_new_thread(funcA, (numbers, numbers2))
two = thread.start_new_thread(funcB, (numbers2, numbers))

def funcA(num, num2):
	l.acquire_lock()
	num = 5
	num2= 6
	l.release()
	
def funcB(num2, num):
	l.acquire_lock()
	num = 3
	num2 = 7
	l.release()
Now the threads play nice by releasing the thread after their finished. Instead of being
like that guy at EB who plays the newest game on the PS2/Xbox360 for four hours, they
share. You can create a lock for every variable that needs protecting. Recently, I
helped Htregz's python port scanner by fixing the udp problem, and i made it threaded.
There was one lock associated with with a list called openPorts. I didn't want to
have two threads both trying to append at the same time, thus erasing results.
heres the code:
Code:
def fastudpscan(ipaddress, timeout, numberofports, portstoscan):
	"""
	This function runs a UDP Single IP Scan.
	Requires ipaddress, timeout, numberofports, and portstoscan variables.
	"""
	global lock 
	lock = thread.allocate_lock()
	global openPorts
	openPorts = []
	closedPorts = []
	for portcounter in range(numberofports):
		port = int(portstoscan[portcounter])
		thread.start_new_thread(threadedUdpScan, (timeout, ipaddress, port))
	while len(openPorts)+len(closedPorts) != numberofports:#this is here to ensure that all threads started have finished
		pass
	if len(openPorts) >= 0.9*numberofports:
		print
		print "There is a firewall present that is blocking udpscans."
	else:
		for port in openPorts:
			print "Port", port, "is open."
		
		
def threadedUdpScan(timeout, ipaddress, port):
	scansocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	scansocket.settimeout(timeout)
	try:
		scansocket.connect((ipaddress, port))
		scansocket.recv(10)
		lock.acquire()
		closedPorts.append(port)
		lock.release()
	except socket.timeout:
		lock.acquire()
		print ".",#this is here to indicate number of successes.
		openPorts.append(port)
		lock.release()
	thread.exit()
I highlighted all of the relevant thread code, including the locks. Since only one thread
at a time can acquire the lock, this is quite threadsafe. Some threads may be blocked
for a few roatations, but eventually a thread will be finished with its lock and release it.

The last form of parallelism only applies to sockets and other similar I/O devices.
Its called asynchronous processing. The method is pretty simple. Just start up a bunch of
sockets, put them in non-blocking mode, then just check their states to see if they can
send or recieve.
Code:
import socket
import select
num = 10
sockdrawer = []
while num != 0:
	sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	sock.setblocking(0)
	sockdrawer.append(sock)
	
array = {}

while 1:#infinite loop
	readable, empty, empty = select.select(sockdrawer, [], [])
	#select.select returns three lists, one for those
	#ready to be read(the first one, the only one we care about 
	#right now. One for writing, and one for exceptional
	#situations
	if len(readable) > 0:
		for sock in readable:
			data = sock.recv(10)
			if array.has_key(sock):
				array[sock].append(data)
			else:
				array[sock] = [data]
	else:
		for key in array.keys():
			print ''.join(array[key])
			# the above line simply takes the sequence of data in the array all
			#and appends it together by using the string.join method. very useful!
This simple example will take everything sent to these sockets and append it to a list
referenced in a dictionary. dictionay.has_key is very useful for testing the existence of
keys. Otherwise it creates a new entry in the dictionary.
This can be useful for echo servers etc.

Thank you for taking the time to read this lengthy tutorial, and I hope you enjoyed it!
Any questions, comments, criticsisms welcomed!