64 bit numbers and 64 bit processors

**pcllwtfdtm** · January 26th, 2005, 01:54 AM

Im attempting to optimize my program. The whole goal of it is to crunch large 200-1000 digit numbers. While i did find an earler post about handling such data, im doing this with my own Big Number class. For simplicity i stored the numbers in an array. ( i could do a link list, but lets not get into that). Now the way its set up is such:

Take the number 12345, and in the array, its [1][2][3][4][5]. So each digit is stored as a short. so its a short array. Now, if i recall, and this is where you come in, a short is 8 bits long, so if im using a 32 bit proccessor, when i do an operation on one cell of the array will i be only utilizing the 1/4 of the proccessor? And now that i have a 64 bit proccessor... will i have to do something like:
__int64 BigNumber[arrayLenght];
//such that 1234567890 => [12345][67890]
//assuming 12345 is the largest number __int64 can hold.
Then another problem comes in, do i have to compile the program with a 64 bit compiler and not a 32 bit? Anyhelp/insight is apreciated. The whole goal is to cruch as much data in one proccess.

~Pcllwtfdtm
(P.C. Load letter?! What the F**K does that mean? ~office space)

***sec_ware*** · January 26th, 2005, 10:15 AM

Hi

It is possible to gain factors 100 (!) by choosing an optimal data structure and an
optimal order to perform the calculations, without even touching anything processor
specific. So this is what I recommend you - do a trial and error brute-force benchmarking

.

Nevertheless, a few notes on different optimisation and processors.

Preliminaries

Simply spoken, a processor can perform operations on a number of bits within one clock
cycle. A 32bit processor can handle 32bits, a 64bit processor 64bits. Besides this,
there are other difference between the two architectures, like memory allocation but
also different setup of the elements on the chip itself.

If you want to perform operations on 64bit structures, it takes usually longer than
two clock cycles on a 32bit processor, while it can be done within one clock cycle
on a 64bit. This is just one difference.

Note: If you want to optimise your library on that level, you will require assembler
for most parts. There are only a few simple tricks (like the correct order of the loops, that
allows for optimal pipelining (see below)) in a high-level programming language.

Example: If you decide to use linked lists, forget about performance issues dealing with
8bit, 32bit numbers etc: This will then be negligible! Your current choice of these
arrays sound a bit suboptimal to me. I recommend to look at open-source big number
solutions and tricks they used to optimise their approach.

Performance

8 bits (...) will i be only utilizing the 1/4 of the proccessor?

Hm, yes, if you wish so. You spend 1 clock cycle to perform operations on 8bit structures,
while the processor would be able to do 32bit, spoiling 24bits. Nevertheless, this can
be optimised by the compiler in principle, if you have coded appropriate.

But that's not the real issue for code optimisation:

- even older processors (PIII) with MMX support are capable to perform an operation on 128bit
within one clock cycle (SSE instruction set[1])

- the real issue is pipelining[2] with scalar processors. Vector processors (Cray etc) have to
be handled differently. Note: The real speed gain on Cray computers is not due to their
processor, but to the way they handle memory: 64, 128, ...1024 memory banks
compared to 2-3 on standard machine setups (latency hiding).

- in order to get enhanced performance on a 64bit processor, like Opteron, you have to
pursue different strategies as well (make use of SSE3)[3]

For your project: Try to learn from what has been done already. Think about choices
of your data structures (what is really needed? how can I optimise pipelining?). Then,
you will have a fast program already. Do not really worry about processor internals.

Cheers!

[1] https://shale.intel.com/SoftwareColl...sp?courseID=23
[2] http://www.cs.iastate.edu/~prabhu/Tu...ipe_title.html
[3] http://www.amd.com/us-en/assets/content_type/ white_papers_and_tech_docs/25112.PDF

**pcllwtfdtm** · January 26th, 2005, 11:44 AM

Wow, thank you for the indepth answer, I will look into the open source alternatives as you have suggested, and the recomeneded links you have posted. Again thanks!
~Poe

***sec_ware*** · January 26th, 2005, 12:03 PM

Hi

You are welcome. Well, the links I provided are illustrating what I said. In order to
learn optimization, start with a google search like
"code optimization c c++ pipelining examples filetype:pdf".
Eg. look at [1], and continue reading from there. Try to understand optimisation by
looking at simple examples, and then apply it to your specific problem.

The problem with optimisation is, that there are some general concepts, but high-end
optimisation is very specific to the task to be performed by the program. In my previous
post, I have choosen some point of view for illustration (that's why I am not very happy
with it). But I hope, it helps you to get the idea.

Good luck.

Cheers

[1] http://www.azillionmonkeys.com/qed/optimize.html

Thread: 64 bit numbers and 64 bit processors

Thread Tools

Display

64 bit numbers and 64 bit processors

Posting Permissions