Hacker News new | ask | show | jobs
by pm2222 1456 days ago
There are columns of data, and as a simple example they are {{1,a},{1,b},{1,c},{2,a},{2,b},{2,c},{3,a}} of two columns and seven lines. I'd like to process the data through a function which would yield {{{1,2},{a,b,c}},{{3},{a}}} which is slightly better than {{{1,2,3},{a}},{{1,2},{b,c}}} because former has 7 leaf elements vs latter 8 leaf elements while both has if you will two lines.

How to write that function?

3 comments

This is similar to a Karnaugh Map: https://www.youtube.com/watch?v=3vkMgTmieZI

Basically, you can map the data values into a boolean truth table representation:

      a b c
    1 x x x
    2 x x x
    3 x
And use the Karnaugh Map to visually simplify:

      a b c
    1 ┌───┐
    2 └───┘
    3 □
into:

    {1,2}{a,b,c}
    {3}{a}
If you'd like to do this algorithmically, there are Karnaugh solver libraries which use algorithms such as the https://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algori...
Very nice. Can this be applied to more than 4 dimensions?

This problem is an abstract of a network firewall rule optimization. The a,b,c... are for source IP/net, destination IP/net, protocol/port, permit/deny and probably more like zone names.

If the data is ordered by row as in your example (e.g. {1,a},{1,b}, etc), a relatively straightforward option would be to read all columns for the row and use the values as a key such that hash(columns) -> list(rowNum).

This may not give you the lowest number of leaves depending on your dataset. If you're looking for the absolute smallest number of leaves, you could use each column as an individual key and start intersecting their lists to see if that produces a reduced leaf count.

I think you need to provide more information.

Do you have a CSV file? Is this a specific data structure in some language?

Is the output another CSV file / std out?