MIPS Assembly, an Introduction
by: h3r3tic

The following is a short introduction to programming in assembly for the MIPS architecture. I know what you're thinking,
"I don't even know what MIPS architecture is, why would I want to write assembly for it?" Well, I'll tell you why.
There is a simulator available for download here: http://www.cs.wisc.edu/~larus/spim.html
which will let you run your MIPS assembly programs without having a computer with the MIPS architecture.
I've dabbled a little with nasm, and I just think MIPS is a lot more straightforward, and everything seems to
have a simple explaination of why it's there or what it's doing. So I have decided to share this perhaps
little known, or perhaps well-known "language" with you. (I only knew about it after starting a class where
we use it).

I'll start by showing you how to write the traditional "Hello World!" program. Now, there is no set way
to do this, and I'm sure it can be done many ways, but I will show you two ways to do it.
So, the uncommented code is as follows:
Code:
.data
hellostring:	.ascii "Hello "
		.asciiz "World!\n"

.text

main:
	la	$a0, hellostring
	li	$v0, 4
	syscall

	li	$v0, 10
	syscall
and here is the commented code:
Code:
.data						# comments are preceeded by the # character
						# .data is where you define "variables", or how I understand them...
						# references to the space in memory where the following defined data is stored

hellostring:	.ascii "Hello "			# .ascii is how you define a non-null terminated
						# string.

		.asciiz "World!\n"		# this .asciiz is a continuation of hellostring,
						# putting the rest of the string in memory, with
						# a null terminator, which is what the .asciiz if for
						# I could have made it one line with:
						# .asciiz "Hello World!\n";

.text						# this defines where code to execute starts

main:						# this can be defined as many things in my opinion
						# I would have to say it is most accurately described
						# as a memory reference, but you could also say
						# it defines the beginning of a procedure or function.

						# la = load address
	la	$a0, hellostring			# this loads the address which hellostring refers to into
						# the register $a0, which is one of 3 used for arguments
						# to "functions"

						# li = load immediate
						# which is used to load characters and integers
						# onto registers
	li	$v0, 4				# we are loading the integer 4 onto the register
						# $v0, which tells the system we want to output
						# a string.  I will go over more system calls later

	syscall					# this tells the system to execute the "function"
						# defined by loading 4 onto $v0.  $v0 is out register
						# for system functions.  When you do your own functions
						# you don't need to use $v0, but can, and you won't
						# have to say syscall, and don't know if you even can

	li	$v0, 10				# 10 is the call to end the program, and when this value
						# is on $v0 and a syscall is made as here, the program
						# will terminate.
	syscall
The preceeding code will output:
Hello World!

Pretty simple for being assembly eh? To run your code you can save it with any/no extension, and on linux
just type the command:
spim -file yourfilename

and it will run your program (assuming spim is in your path and you saved your file in the current directory you
are in as "yourfilename" without quotes :P).

On windows with pcspim, I think you can do the same thing on the command line.
If not check out the site you downloaded it from (http://www.cs.wisc.edu/~larus/spim.html) they have a bunch of
info including how to run programs in the pcspim gui. You can probably figure it out on your own though. Moving on.

Let's take a look at a more interesting approach to the hello world program:
full uncommented code:
Code:
.data
stringstore:	.space 20

.text

main:
	la	$s0, stringstore

	li	$t0, 'H'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'e'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'l'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'o'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, ' '
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'W'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'o'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'r'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'l'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, 'd'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, '!'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	li	$t0, '
'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1

	sb	$zero, ($s0)

	la	$a0, stringstore
	li	$v0, 4
	syscall

	jr	$ra
leaving out repetitive parts and commented:
Code:
.data
stringstore:	.space 20			# allocate 20 bytes starting at address referenced by
						# stringstore

.text

main:
	la	$s0, stringstore			# load address of stringstore into $s0.  $s0, is one of
						# seven save registers ($s0-$s7).  You can store stuff in these
						# save registers with confidence it will be there later
						# I believe each register is 32 bytes.

	li	$t0, 'H'				# $t0 is, like $s0, one of seven of its kind ($t0-$t7).  But unlike 
						# $s0 it is meant for temporary storage, which is why it's called
						# temporary register. You use it to load things
						# you will need access to soon, but not throughout the program.
						# I am loading the literal character H onto $t0, (or maybe
						# its ascii value, who knows what it's actually doing)

						# sb = store byte
	sb	$t0, ($s0)			# ok, this part is important, and took me a while to figure out.
						# notice the parenthesis around $s0.  Those are
						# indicating that I am storing the contents of $t0 
						# at the address on $s0.  I didn't know this was possible
						# at first, but it is very useful, for me at least.
						# I really needed to store stuff at an address I had
						# on a register in most of my programs.

	addi	$s0, $s0, 1			# $s0 = $s0 + 1;  The far left register is the destination.
						# the i on the end stands for immediate, meaning you can
						# specify integers rather than having to load them onto a
						# register to use the normal add.  We are adding 1 here
						# to increment to the next address to store the next letter
						# in our buffer of 20 bytes called stringstore.  Remember
						# we loaded its address onto $s0 at the beginning of main
						# now we are at a location one byte away from its address
						# which is convenient since characters are 1 byte, so we can
						# now store our second character.

	li	$t0, 'e'
	sb	$t0,  ($s0)
	addi	$s0, $s0, 1
	li	$t0, 'l'
	sb	$t0, ($s0)
	addi	$s0, $s0, 1
	sb	$t0, ($s0)			# notice I didn't load a new character onto $t0, that's because
						# I already loaded 'l' onto $t0 and it's still there, no need to load
						# it again.

	li	$t0, 'o'
	sb	$t0, ($s0)
						# I think you can see where this is going
						# to load a newline character you can give it literally, like this

	li	$t0, '
'
						# notice the ending single quote is on the next line.  That's because
						# I hit enter there to indicate a newline character.  I think I tried this
						# using '\n' and it didn't work.  So this should work for you.

						# ok, so now you're loaded and stored all the proper characters into stringstore
						# assuming you filled in code I left out because it repeats
						# we probably don't need to null terminate the buffer, but let's
						# do it anyways just to be safe.  I am assuming here that you incremented
						# $s0 after storing the last character into stringstore

	sb	$zero, ($s0)			# $zero is a reserved register in mips, for zero.  Which can also be used
						# to terminate strings, and as false for booleans etc..  $0 is the same as
						# $zero.  So now we have all our characters in our "buffer" and they are
						# "null" terminated.  Let's load our "string" for output

	la	$a0, stringstore			# all that stuff we were doing earlier was directly affecting the memory
						# allocated for stringstore, so essentially, stringstore is a pointer to
						# the address of the first character in our string.  Let's do the same syscall
						# as in the previous hello world program, which will output what's in our buffer
						# until we hit a our terminator (where we stored $zero)

	li	$v0, 4				# call for string output
	syscall					# do string output

	jr	$ra				# another way to end the program other than loading 10 onto $v0
						# this is assuming $ra hasn't been modified, which it hasn't.
						# $ra gets modified when you do a jal (jump and link) which
						# stores the spot after your jump in your program into $ra
						# so you can return to it by jumping to it.
For me this version of hello world relates heavily to C. $s0 in this program is like a character pointer,
and you are simply moving through an array of bytes storing each character to make a string.

Also, in the example above, I used sb to store one byte of data at the address which was on
a register. You can also work with an address on a register with load byte (lb).

lb $t0, ($s0)

Will load the value stored at the address on $s0, which will probably be a character, or if
an integer (4 bytes) is stored there, you will only get part of it. So I recommend you only
use this when working with characters as above or anything that is only one byte.

Here are some of the basic system calls

1: When you load immediate (li) 1 onto $v0, you are telling the system that you want to output an integer.
I will show you an example using integers later. You will have to load the integer you want to output onto
$a0, before you do your syscall. $a0, is the argument for this system call, in this case an integer.

4: When you load immediate 4 onto $v0, you are telling the system that you want to output a string.
We have discussed this a bit already. Basically it will output what's at the address on $a0 until it hits
a null terminator. An address must be loaded onto $a0 as an argument to this function prior to your syscall.

5: When you load immediate 5 onto $v0, you are telling the system that you want to read in an integer.
I will also show you an example using this later. This call does not use $a0, instead it stores the integer
you input onto $v0, which is the standard for return values. You can then move that value onto another register
or store it in a "variable".

8: When you load immediate 8 onto $v0, you are telling the system you want to read a string. This function
takes two arguments which you load onto $a0 and $a1. On $a0 you put the address which you want the string
you enter to be stored at, and on $a1 you put the length of the string you want to enter. You can use li to load
your length onto $a1, which lets you load an integer rather than having to load it onto another register than move it.
One thing I noticed about this function is that it seems like when you specify a length, it only reads up to one less
than the length you specified. So just be aware of that, and maybe add one to the actual length you want when using
this. I will also give an example of this later.

Yay, later is here, I can give you all those examples I promised. Or cram them all into one little I/O example.
First the uncommented version for easier pasting:
Code:
.data
num1:	.word 0
num2:	.word 0
str1:	.space 50
prompt1:	.asciiz "Enter a string: "
prompt2:	.asciiz "Enter a number: "
prompt3:	.asciiz "Enter another number: "
msg:		.asciiz "You entered the string: "
msg2:		.asciiz "You entered the numbers "
msg3:		.asciiz " and "
newline		.asciiz "\n"

.text

main:
	la	$a0, prompt1
	li	$v0, 4
	syscall

	la	$a0, str1
	li	$a1, 51
	li	$v0, 8
	syscall

	la	$a0, msg
	li	$v0, 4
	syscall

	la	$a0, str1
	li	$v0, 4
	syscall

	la	$a0, prompt2
	li	$v0, 4
	syscall

	li	$v0, 5
	syscall
	sw	$v0, num1

	la	$a0, prompt3
	li	$v0, 4
	syscall

	li	$v0, 5
	syscall
	sw	$v0, num2

	la	$a0, msg2
	li	$v0, 4
	syscall

	lw	$a0, num1
	li	$v0, 1
	syscall

	la	$a0, msg3
	li	$v0, 4
	syscall

	lw	$a0, num2
	li	$v0, 1
	syscall

	la	$a0, newline
	li	$v0, 4
	syscall

	jr	$ra
now the commented version:
Code:
.data
num1:	.word 0						# .word is for storing integers, or just 4 bytes of data.
							# We initialize it to 0 just because we're going to 
							# change it anyway
num2:	.word 0
str1:	.space 50					# allocate 50 bytes for string input

prompt1:	.asciiz "Enter a string: "			# prompts to output to user
prompt2:	.asciiz "Enter a number: "
prompt3:	.asciiz "Enter another number: "

msg:		.asciiz "You entered the string "		# strings to prepend to the user input for labeling
							# our output
msg2:		.asciiz "You entered the numbers "
msg3:		.asciiz " and "
newline		.asciiz "\n"				# "declaring" the newline character so we can output a newline
							# when it's not already there

.text

main:
	la	$a0, prompt1				# load the address of prompt for string output
	li	$v0, 4					# and output it
	syscall

	la	$a0, str1				# load the address of str1 (our buffer for input) onto
							# $a0 as the first argument for string input.
							# the input will be stored at this address.
	li	$a1, 51					# then load the length we want to read onto $a1
							# as our second argument for string input.
	li	$v0, 8					# li 8 onto $v0 for the call for string input
	syscall						# make the call

	la	$a0, msg				# you should know what this does by now
	li	$v0, 4
	syscall

	la	$a0, str1				# loading the address of our buffer which should
							# now have the users string in it.
	li	$v0, 4
	syscall						# output our buffer

	la	$a0, prompt2				# standard string output routine, same as before
	li	$v0, 4
	syscall

	li	$v0, 5					# this is our integer input call, notice we don't have
							# any arguments here, i.e. nothing needs to be
							# done with the $a* registers.
	syscall
							# sw = store word
	sw	$v0, num1				# the users input is stored in $v0 after the syscall, this
							# stores it in num1.  This is similar to sb.  any register containing
							# a word (integer) can be used here instead of $v0, it's just that in this
							# case we want what's on $v0.

	la	$a0, prompt3				# seen it
	li	$v0, 4
	syscall

	li	$v0, 5					# getting our second integer
	syscall
	sw	$v0, num2				# storing it.

	la	$a0, msg2				# seen it
	li	$v0, 4
	syscall

							# lw = load word
	lw	$a0, num1				# load integer at address num1 (not the address which would
							# be loaded with la) onto register $a0 as an argument for integer
							# output.  You can lw onto most of your regsiters, including
							# all your save and temporary registers

	li	$v0, 1					# load immediate 1 onto $v0, which tells system we want to do
							# integer output
	syscall						# tell the system/do it

	la	$a0, msg3				# seen it
	li	$v0, 4
	syscall

	lw	$a0, num2				# same as with num1, except this time we're loading what's at num2
	li	$v0, 1
	syscall

	la	$a0, newline				# output our newline so we don't get the command prompt
							# right after our output
	li	$v0, 4
	syscall

	jr	$ra					# end program
Just as a side note, you can use lw to load a word from an address stored on a register, just as you can
use sw to store a word at an address stored on a register. to use sw in this way it's just as we have seen
before with sb, you do:

sw $t0, ($s0)

assuming you have a word/integer on $t0 and an address on $s0. Using lw to load from an address on a register
is accomplished in this way:

lw $t0, ($s0)

This is loading the value stored at the address on $s0 into $t0.

I would like to mention two more things before I end this tutorial. First, the move operation. I didn't use it in any
of the examples above, but it is important. Let's say you had an address on $s0, and you wanted to load that address
onto $a0 for string output. It would be cumbersome to store what you have on one of your variables to load into $a0. This
is where move comes in. you can simply do this in the situation described a moment ago:

move $a0, $s0

in this operation, the register on the left, in this case $a0, is your destination, and the one on the right is your source.
So whatever is in the $s0 register will be "moved" into the $a0 register. Another good thing is that it is also still on
the $s0 register, even though the name almost indicates it will not be on there anymore but will be moved to $a0.

The last thing I wanted to mention is offsets. If I do another tutorial I will go into this in more detail most likey doing
some array examples. Let's say you had a .space buffer with 80 bytes allocated. On this buffer you have an 80 character
string. Let's say you only wanted to print from the 5th character to the end. What you could do is load the address onto
$a0 with an offset like this:

la $a0, 4($s0)

I used 4 here because 0($s0) would be the first character of the buffer assuming the address of the buffer was stored on
$s0. So if 0 is the first character that makes offset 4 the 5th character. When working with "arrays" of words, you will have
to multiply whichever "element" you want by 4, because each "element" is 4 bytes. So assuming we want the 5th
number in an array, we would do:

lw $t0, 16($s0)

same as before, except we're loading a word stored at the position we want, and each position occupies 4 bytes instead of 1.
That is a short summary of offsets, but I think that's pretty much all you need to know about them to use them.

I have hardly scratched the surface of programming in mips assembly, but I have almost taught you everything I know.
I hope that you have learned a lot and will take the time to mess around with this. It's very interesting and if you're a
geek like me, it is also fun. If I get enough positive feedback on this, and it seems like you all want to know more, I
might write another tutorial which will probably bring everyone here up to where I am. Enjoy!