Ryan McWhorter | Prime Factorization

Why?

Fundamentally, this is about programming code that runs as fast as possible. Programming for speed is hard: no one disputes this. It also happens to be really important. High speed code is necessary for medical imaging, it's necessary for quantiitative finance, it's necessary for self driving cars. One of the most interesting developments in this space in recent years is Rust, a language that claims to challenge C++, the reigning champion.

Summary

In this post, I'll take you through the unique factorization theorem, and use it to gauge the preformance of Rust and C++.

The Unique Factorization Theorem

This theorem is also sometimes called the fundamental theorem of arithmetic. The essential claim of this theorem is that every positive integer (1, 2, 3, 4, 5, ... , n) can be factored into primes. For example, 60 factorizes into 2 x 2 x 3 x 5, and 61 factorizes into 1 x 61. Remember that prime numbers are positive integers which are divisible only by itself and 1.

Who Cares About Unique Factorization?

The short answer is that everyone who cares about cryptography cares about unique factorization, but why that is the case is a story for another time. If you now ask "but who cares about cryptography?", I would counter that everyone who likes to keep their money in a bank account cares about cryptography. With regards to gauging code preformance, we care because it turns out factoring semi-primes (numbers that are the product of only two primes) is really hard. In fact, it is hard enough that it serves as the basis of RSA (a prominent public-private cryptographic method). Now, RSA uses 2048 bit integers. We don't have that kind of time (by design), so we will be factoring 64 bit integers.

The Code

The code for this can be found on GitHub. Let's start with the Rust code. I'm not going to claim that anything I write here is the fastest it could be; I'm sure little improvements can be made here or there. But the algorithms used in the Rust section are almost identical to those used in the C++ section. In fact, in places I was able to just copy and paste, adding parenthesis as needed. First, we have to be able to determine if a number is prime or not. Let's write that.

fn is_prime(n: i64) -> bool {
	if n <= 1 {
		return false;
	}
	if n <= 3 {
		return true;
	}

	if (n%2 == 0) || (n%3 == 0) {
		return false;
	}

	let mut i: i64 = 5;
	while i*i <= n {
		if (n%i == 0) || (n%(i+2) == 0) {
			return false;
		}
		i = i+6;
	}

	return true;
}

Now, to factor a number, we're going to just start dividing it by primes, starting with 2. To do this, we'll need to find the next prime after a number n.

fn next_prime(n: i64) -> i64 {
	if n <= 1 {
		return 2;
	}

	let mut prime: i64 = n;
	let mut found: bool = false;

	while !found {
		prime += 1;
		if is_prime(prime) {
			found = true;
		}
	}

	return prime;
}

Finally, we can get down to the business of actually factoring our numbers. It looks like this:

fn factorize(n: i64) -> Vec<i64> {
	let mut live: i64 = n;
	let mut rtn: Vec = vec![];

	let mut prime: i64 = 2;
	while (&rtn).into_iter().product::() != n {
		if live % prime == 0 {
			rtn.push(prime);
			live = live / prime;
		} else {
			prime = next_prime(prime);
		}
	}

	return rtn;
}

We can do exactly the same thing in C++.

bool is_prime(long long int n)  
{  
	// Corner cases  
	if (n <= 1)  return false;  
	if (n <= 3)  return true;  
	
	// This is checked so that we can skip   
	// middle five numbers in below loop  
	if (n%2 == 0 || n%3 == 0) return false;  
	
	for (int i=5; i*i<=n; i=i+6)  
		if (n%i == 0 || n%(i+2) == 0)  
			return false;  
	
	return true;  
}  
	
// Function to return the smallest 
// prime number greater than N 
int next_prime(long long int N) 
{ 
	
	// Base case 
	if (N <= 1) 
		return 2; 
	
	long long int prime = N; 
	bool found = false; 
	
	// Loop continuously until isPrime returns 
	// true for a number greater than n 
	while (!found) { 
		prime++; 
	
		if (is_prime(prime)) 
			found = true; 
	} 
	
	return prime; 
}

vector<long long int> factorize(long long int n) {
    long long int live = n;
    vector rtn;

    long long int prime = 2;
    while (accumulate(begin(rtn), end(rtn), 1, std::multiplies()) != n) {
        if (live % prime == 0) {
            rtn.push_back(prime);
            live = live / prime;
        } else {
            prime = next_prime(prime);
        }
    }

    return rtn;
}

So Let's Time It

I'm going to omit the remainder of the code from here on. An accountant would say it's immaterial. I won't though, I'll just provide a link to a GitHub repo which contains everything. To test this code, we will factor two integers. To test a general case, we can go with 977666464, which factorizes into 2 x 2 x 2 x 2 x 2 x 17 x 1797181, and to test the semi-prime case, we can use 858645989, which is 1579 x 543791. I ran each for 100 trials, here are the results.

	Rust	C++	[Rust/C++]
977666464	547288.23 μs	431460.00 μs	+26.8%
858645989	104174.32 μs	83304.70 μs	+25.1%

Much to my own surprise, Rust got thoroughly beaten. I've been writing Rust for some time now but I'm quite new to C++, so I thought that my C++ code would be slower than my Rust code. This does hold with a more general trend of C++ being slightly faster than Rust, probably because of the extra time spent on memory guarentees by Rust. Hopefully as Rust progresses it will get faster and close the gap, but for now C++ will probably remain king of languages for time constrained applications.