Denormalised Design: Prime Numbers and SQL

Many may think that prime numbers are for the mathematicians to play with, and bit of an obscure combination to use with SQL. However, they can be very useful for implementing a compressed data solution providing some additional set theory operations difficult to achieve in SQL.

Prime Number

In mathematics, a prime number is a natural number which has exactly two distinct natural number divisors: 1 and itself. The number 1 is by definition not a prime number.

Generating Prime Numbers

If you scan the Internet for example PL/SQL functions for generating prime numbers and you will find many examples. However, it is actually quite straight forward to generate them from a single SQL statement.

A useful technique with SQL for generating data is using the hierarchical join syntax CONNECT BY. The following statement generates ten rows numbered 1 throught to 10:

SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= 10

So integrating this approach in the following statement, it is possible to generate all of the prime numbers under a thousand (excluding 1 which is technically not a prime number):

SELECT prime_no,
RANK() OVER(ORDER BY prime_no) rank
FROM
(
SELECT LEVEL prime_no
FROM dual
CONNECT BY LEVEL <= 1000
)
WHERE
(
SELECT COUNT(DECODE(MOD(prime_no,ROWNUM),0,'X',NULL))
FROM dual
CONNECT BY LEVEL <= ROUND(SQRT(prime_no),0) ) = 1 AND prime_no > 1;

So we now have a useful means of generating prime numbers with SQL and by adding the analytic rank function we also have the ranking:


PRIME_NO RANK
-------- ----
2 1
3 2
5 3
7 4
11 5
13 6
17 7
19 8
23 9
29 10
...

Prime Number Product

A product of any number of prime numbers will not result in a prime number. And this result will only have factors that are prime numbers, and only the original prime numbers. So it is possible to multiply two or more prime numbers and decode the result back to the original prime numbers.

For example:

3 x 7 = 21

To decode, step through each prime number and see if it is a factor of the result.

MOD(21,3) = 0 … so 3 is a factor of the prime number product.

MOD(21,5) = 1 … so not a factor

MOD(21,7) = 0 … so 7 is a factor and a component

Database Design

From a database design perspective, this gives us the opportunity to hold a set of foreign key values in a single column. This may not be too appealing to the purists amongst us, but it can have huge benefits by eliminating very large child tables. To make this work, the candidate child table needs to be a pure intersect table between the parent table and a small reference table with a finite set of values where each entry can be assigned a unique prime number (an alternative key). The Oracle numeric data type has a precision of 38 significant figures, so the prime number product can represent a lot of numbers, but is not limitless.

The prime number product on the parent table can be decoded to identify the child entries. Adding a new key value is achieved by multiplying the existing prime number product with the new prime number, and to remove a key is achieved by dividing the product by the key’s prime number.

Another benefit of using prime number product is being able to easily identify whether a subset of values is completely within a superset of values. A set of values can have a prime number product calculated for it and if found to be a factor of the superset prime number product, then it can be considered to be a complete subset.

For example:

Set A consists of (5,11) product A = 55

Set B consists of (3,5,11,17) product B = 2805

To determine that set A is a subset of B, check that product A is a factor of product B. In SQL we would use the MOD function, MOD(2805,55) = 0, i.e. it is a subset.

A SQL approach to decoding the prime number product would be quite straight forward:

SELECT a.*,
b.ref_desc
FROM my_main_table a,
my_ref_table b
WHERE MOD(a.prime_number_product, b.prime_number) = 0
AND a.id = :p_id

Calculating a Product

No aggregate functions exist in Oracle that allow you to calculate the product of a set of values, but there is a simple work around using logarithms. To multiply two numbers using log tables requires looking up the log for both numbers and adding the two together and then look up its exponent (opposite to log) to give the result. So using a base 10 logarithm, the prime number product can be aggregated in a query as follows …

SELECT POWER(10,SUM(LOG(10,b.prime_number))) prime_no_prdct
FROM my_ref_table b
WHERE b.id IN (....

Alternative BITAND

Prime product is one approach, but an alternative approach is using the BITAND function which has similar benefits, but used where you may want to test whether any value in set A was is in set B.

Each number would require a single bit set, so only base 2 numbers with integer exponents, e.g. 2**1, 2**2, 2″”3, etc, i.e. 2, 4, 8, 16, 32, 64, etc.

For example:

Set A consists of (4, 16) so bit value = 4 + 16 = 20

Set B consists of (2, 4, 8) so bit value = 2 + 4 + 8 = 14

To determine that a value in set A is in set B, then check BITAND(20, 14) = 4, so 4 is the intersect between the two sets. A non-zero value means that at least one value matches.

Alternatively, to check a subset is contained within a superset.

For example:

Set A consists of (2, 4) so bit value = 2 + 4 = 6

Set B consists of (2, 4, 8) so bit value = 2 + 4 + 8 = 14

To determine that set A is in set B, then check BITAND(20,6) = 6, which is the bit setting for the subset, so this confirms it is a subset.

Again, the versatility of SQL allows you to generate the bit values:

SELECT POWER(2,LEVEL), LEVEL rank
FROM dual
CONNECT BY LEVEL <= 400

Unfortunately, the possible numbers are far more limited than prime numbers and limited to 400 values before the number overflows the Oracle numeric data type. In fact Oracle’s numeric data type can only hold 38 significant figures, so beyond 64 values it is not able to hold the number to the full precision and the bit settings get lost. A SQL approach to decoding the bit set would be straight forward too: SELECT a.*, b.ref_desc FROM my_main_table a, my_ref_table b WHERE BITAND(a.bit_set, b.bit_value) > 0
AND a.id = :p_id

Conclusion

Both approaches have very similar benefits and only really differ in their limitations. For an Oracle based solution, there are far more prime numbers available than bit based numbers, so prime numbers would be favoured for solutions where you have more than 64 values to represent. However, multiplying many prime numbers together soon become very large and can exceed the 38 significant figures, so favoured for smaller set sizes.

0

Leave a Reply