In a previous blog post I presented a ‘common sense approach’ to distribute a value onto detail records without a rounding error. Remember, we were distributing 100 EUR – which is 10.000 cents – to 6 records. 10000/6=1666, the remainder is 4, so 4 of the 6 records get one plus cent. The benefit of this algorithm is that the only extra calculation required ‘per record’ is a record number within the group, to pick the 4 ‘beneficiary’ records – but you do not need an extra select compared with a ‘lossy’ distribution.
But the problem was a simplified one: we distributed the value equally between all the records. Unfortunately real life is not that simple, and the weights for distribution are often not equal. Equal distribution might work well if a monthly value needs to be distributed to days of the month, but distributing a stockyard cost onto products would typically be done in relation to the product value.
In the equal distribution case all records will have the same (unrounded) value, and thus the same rounding error – it is easy to assess this rounding error in advance, precalculate it as a single value, and deal with it. However in the weighted case, the rounding error will depend on the combination of weight, original value and the total of weights, so it can not be predicted without actually calculating the rounded value. At least I could not find a simple, “one step” way.
I have found one library for Perl which is meant to solve the weighted distribution issue, it is also iterative, and quite ‘mathematical’ – definitely not an option for SQL, especially if you are more engineer and less mathematician. 🙂
For a “perfect” solution one could calculate the rounding error per record with a very high precision and start cumulating that error along the records. Once the cumulated error is bigger than the smallest number that can be represented in the output value (i.e. > 0,01 when the output type has 2 decimals) it would be added to the current record, and subtracted from the cumulated error value. The problem with this solution is that is tricky to implement in SQL, and due to the cumulated calculation it is slower.
As fairness of distribution of the rounding error was less important than performance, I did it the simple way – an approach often used, but probably not so often implemented in SQL:
- distribute value and round, at this stage still having the rounding error
- calculate the total rounding error (per group)
- add it to one of the records within each group – preferably the one with the highest weight
And this is how the SQL code would more or less look like:
--Table lt_sum: g1, g2, group_value - where g1-2 are the key of a group, and group_value is the value to be distributed --Table lt_detail: g1, g2, d1, d2, weight - where g1-2 identify the group, d1-d2 the detail record within the group, and weight is the relative weight --Output: lt_out: g1, g2, d1, d2, distributed_value lt_temp = SELECT detail.g1, detail.g2, detail.d1, detail.d2, sum.group_value AS group_value, ROUND( group_value * weight / SUM(weight) OVER ( PARTITION BY detail.g1, detail.g2 ) , 2 ) AS distributed_value_uncorrected, ROW_NUMBER() OVER ( PARTITION BY detail.g1, detail.g2 ORDER BY weight DESC) AS index_of_bucket_in_group FROM :lt_detail AS detail INNER JOIN :lt_sum AS sum ON sum.g1 = detail.g1 AND sum.g2 = detail.g2; lt_out = SELECT detail.g1, detail.g2, detail.d1, detail.d2, CASE WHEN index_of_bucket_in_group = 1 THEN distributed_value_uncorrected + ( group_value - SUM(distributed_value_uncorrected) OVER (PARTITION BY g1, g2) ) ELSE distributed_value_uncorrected END AS distributed_value, FROM :lt_temp;
The SUM(…) OVER ( PARTITION BY …) part in the first is responsible for calculating the total weight per group, without introducing an additional select. The similar clause in the second statement does the same in order to calculate the total of the distributed values with rounding error.
Again, the code above could be improved and might not be complete or even fully correct. The point is to give a starting point.
One more hint: To reduce the rounding error – and with that the unfairness of the algorithm towards the ‘first’ record within the group – you should consider using bankers rounding. The drawback is of course that there is one more twist which needs to be considered if anyone would like to exactly reconcile the numbers manually.