Suppose you have a list of elements like this:
[ [1,1,1], [1,1,2], [1,1,3], [1,1,4], [1,2,1], [1,2,2], [1,2,3], [1,3,1], [1,3,2] ]
It would be nice to arbitrarily aggregate this such that the result is:
[ [1, 1, [1, 2, 3, 4]], [1, 3, [1, 2]], [1, 2, [1, 2, 3]] ]
That is, all elements whose first two columns are equal are lumped into one lineitem. The remaining data (the third column) gets aggregated into a list.
I've written such a thing in Python. Here is the code:
# Aggregation of lists # # Author: Michal Guerquin # January, 2005 def ag(lol, doconsider, donotconsider): result = {} for element in lol: contribute(result, element, doconsider, donotconsider) return result.values() def contribute(result, element, consider, unconsider): fingerprint = tuple([ element[i] for i in consider ]) x = result.get( fingerprint, [None]*len(element) ) for c in unconsider: if type(x[c])==list: x[c].append(element[c]) else: x[c] = [element[c]] for c in consider: x[c] = element[c] result[fingerprint] = x
ag.py
:
In [1]: import ag In [2]: x = [ [1,1,1], [1,1,2], [1,1,3], [1,1,4], [1,2,1], [1,2,2], [1,2,3], [1,3,1], [1,3,2] ] In [3]: ag.ag(x, [0,1], [2]) Out[3]: [[1, 1, [1, 2, 3, 4]], [1, 3, [1, 2]], [1, 2, [1, 2, 3]]]
The doconsider
parameter identifies the column numbers that should be used to identify distinct rows in the result, while the donotconsider
parameter idenfities the column numbers whose values should be lumped together.
Here is a more elaborate example:
x = [ ["John Q. Public", "book", "Cooking", "456 pages"], ["John Q. Public", "book", "Painting", "123 pages"], ["John Q. Public", "article", "Cleaning", "2 pages"], ["Jane B. Brown", "book", "Sleeping", "243 pages"], ["Jane B. Brown", "article", "Running", "5 pages"], ["Jane B. Brown", "article", "Sitting", "1 page"], ["Jane B. Brown", "article", "Coding", "2 pages"] ] for foo in ag(x, [1], [0, 2, 3]): print foo
The result, re-formatted for readability, is:
[ [['John Q. Public', ['Cleaning', ['2 pages', 'Jane B. Brown', 'Running', '5 pages', 'Jane B. Brown', 'Sitting', '1 page', 'Jane B. Brown'] , 'article' , 'Coding'] , '2 pages']], [['John Q. Public', ['Cooking', ['456 pages', 'John Q. Public', 'Painting', '123 pages', 'Jane B. Brown'] , 'book' , 'Sleeping'] , '243 pages']] ]
I've found a really good use for this. Maybe you will too!
https://michal.guerquin.com/aggregate.html
, updated 2005-01-22 21:58 EST