|
|
|
|
|
by kronxe
1896 days ago
|
|
My one of the main goals is protect the database structure as it is, because in big data age we should get data as fast as possible. I see your point to see this algorithm as a basically some sort of look-up table but actually it is not. For example, we can think a well-developed city carries all data related to the city and the people to a big database. Then this city uses a TBCA that specialized for only the this big database's needs like a framework or engine method. However, this specific TBCA is not totally different than other TBCAs used in other types of databases like a game database since they have common propertries like people's names and surnames, the structure of a name database is generally same but TBCA plays huge rol in here, you can configure your algorithm with your needs like an optimizing. Today I am not sure how it should be done, maybe with an ML algorithm. I wrote too much I know but my point is TBCA is not an specific algorithm like gzip or LZW it is a sub branch of compression like an universal set. In the future, There may communities share their specificated algorithms for some structures and their datasets( frequency analysis). It will becomes a pool that you can choose best engine(structural method) and best dataset(freq. analysis) from there. |
|
Yes, TBCA is a scheme not a specific algorithm (I thought I was clear enough in the reply, but sorry if it wasn't). In fact I've actually done the same thing with my own database as well in a semi-automated fashion based on a pattern. For example I had a string with three parts: a number 1--214, an optional ', a number -9--99. My code accepts a pattern `[1..214]['?].[-9..99]` and generates a code that packs this into 8 + 1 + 7 = 16 bits. This works because I was dealing with the Unicode Character Database (the example being kRSJapanase), so I knew its exact pattern without an exception and had no migration problem since upgrading to newer UCD is a non-trivial problem anyway. Also I wanted to put the entirety of UCD to the shared memory, so I controlled most of the data structure to make this compression actually worthy.
My issue with TBCA presented in this way is that it looks like a direct replacement to general compression algorithms when it isn't. I regard this as a database schema because it is akin to RDBMS normalization: if you compress a name "<given> <sur>" (or vice versa) in this way, you can equally have a separate name parts table and two indices in the original table. The only difference is that you have hard-coded that name parts table into your app. I believe you should instead have done caching, so name parts table still exists but you can refer to the memory if you can. That makes a better general approach than TBCA, and also shows that it can't be compared with compression algorithms.