Hacker News new | ask | show | jobs
by lincpa 2616 days ago
In RMDB, database data model (db,table, row, col, value[array,json, int, text, etc.] ) is hierarchical.

Relation is a logical model mapping of data structures, it is just a logical thinking that exists in the brain.

SQL, Prolog, clojure.core, minikanren can be used for relational operations.

1 comments

The relational model isn't arbitrarily hierarchical. We can place a map inside a map, or a map inside a vector, and keep going indefinitely until we run out of memory. Conversely, in a relational database we can't place a database into a table, or a table into a row.

It's true that we can represent relations using collections. In Clojure we'd write:

    (def a
      #{{:a/id 1, :a/name "a1"}
        {:a/id 2, :a/name "a2"}}

    (def b
      #{{:b/id 1, :b/name "b1", :a/id 2}}
But these structures don't allow for efficient lookup or joins, and we lack inbuilt functions to easily deal with data modelled in this way.

Relational databases are based on relational algebra. If Clojure is based on relational databases, then we'd expect to be able to do relational algebra easily in Clojure. But we can't: the core library isn't designed for it, and the built-in data structures aren't designed for it.

In general, I think:

1. arbitrary layering and deep nesting are not good engineering practices.

2. refer to the data-model & code of my latest two posts. I prefer to use hash-map as the table with the primary key hash index, with key as the primary key and val(colname-colval-hashmap) as the row content.

I also don't think relational algebra operations must be implemented in the form of RMDB and SQL. It can also be implemented very elegantly with clojure.core. using hash-map operation is simpler, clearer, smoother and high performance.

There are many ways to implement relational algebra. The thinking is not limited by the "information structure" displayed by the traditional RMDB interface. In clojure, the hash-map(NoSQL) is the underlying physical model, and the relational model is the upper logical model.clojure core function acts as a data manipulation language, I named this architecture SuperSQL or SuperRMDB.

In fact, the original data manipulation language of posgresql and foxpro is not SQL. Clojure core function is closer to foxpro's commands (DML).

3. set, vector, list is generally not a good default data container, only used when needed. you use the set as container, it's difficult to operate data (table, row, column, value).

4. In summary, I think: programming is the process of designing a data model that is simple and fluent in manipulation. To have open thinking, not to be restricted by traditional thinking, to be flexible, adapt to local conditions, and design as needed.

"arbitrary layering and deep nesting are not good engineering practices"

Perhaps, but that's irrelevant; I'm describing the difference in how hierarchical and relational models are designed.

"I prefer to use hash-map as the table with the primary key hash index, with key as the primary key and val(colname-colval-hashmap) as the row content."

And what if you need a second index? Your indexing should be separate from your data model, otherwise you can't write performant relational algebra operations that apply in the general case.

"It can also be implemented very elegantly with clojure.core. using hash-map operation is simpler, clearer, smoother and high performance."

No it can't. Suppose I have a relation with keys: a, b, c, d and e. I want to index on a, b and the pair (c, d). How would I do that in Clojure? What happens if I later decide I also want to index on e?

This is the sort of problem that's trivial to solve in a relational database, and extremely hard in Clojure, because Clojure doesn't have the functions or data structures to support data modelled in this way.

That's not to say that Clojure can't have these tools; just that they aren't built into clojure.core, because that's not what it's designed for.

"set, vector, list is generally not a good default data container, only used when needed"

Yes they are. Sets are the basis of relational algebra.

You're complecting the ideas of data representation with data indexing. Sets are a good representation of a relation, but a poor index.

We can get the best of both worlds by combining the two:

    (def a
      (let [r1 {:a/id 1, :a/name "a1", :b/id 1}
            r2 {:a/id 2, :a/name "a2", :b/id 1}]
        {:relation #{r1 r2}
         :index
         {:a/id   {1 #{r1}, 2 #{r2}}
          :a/name {"a1" #{r1}, "a2" #{r2}}
          :b/id   {1 #{r1 r2}}}}))
A data structure like this allows us to start writing efficient relational algebra. For example, with a natural join we can look for the smallest index two relations have in common.

So we can begin to construct the infrastructure we need to perform relational algebra in Clojure, but it's not there to begin with, and therefore Clojure isn't designed around the relational model.

If you want to build an RMDB with Clojure, you are right, but this kind of project is very rare, including Datomic, which is built on top of RMDB, I don't think it needs to deal with such low-level and general operations.

Implementing an RMDB is not equivalent to relational algebraic operations. I think it should be to use simple, direct and lightweight the relational logic model to solve real-world problems. Don't over-optimize, over-generalize and over-complicate, keep it simple and direct.

Just like we design a Database in RMDB to solve real-world projects, This is the normal way to use relational algebra and models. After the database design is complete, we don't need care how the index is implemented, and we don't need care the underlying storage of the data. I mean, clojure is used as RMDB , is not used as tool of construct RMDB .

Therefore, my method is to design the application-level data model. The problem you said does not exist. When I get the data from the database (or elsewhere), I simply transform the data to the target model, you can think of this model as a table or view, you don't need to transform again, so you don't need multiple indexes.

I'm not talking about building a database; I'm talking about using a relational model for data.

Like most general-purpose programming languages, Clojure has a hierarchical data model. We have a number of collection types, and we can put any collection into any other collection.

A relational model takes a fundamentally different approach. Relational data is represented not by nesting collections, but by a flat set of tuples. Efficiency is achieved through indexing, not by rearranging collections.

There's some interesting research around on using the relational model outside of a database, but that's not a design goal of Clojure.

My method can also generate a "derived index" based on the "primary key hash index",it's Higher performance, not by rearranging collections.

```clojure

(def table01 {:t1-pk1 {:pk :t1-pk1

                       :name    "t1-r1"

                       :manager :m1}

              :t1-pk2 {:pk      :t1-pk2

                       :name    "t1-r2"

                       :manager :m2}

              :t1-pk3 {:pk      :t1-pk3

                       :name    "t1-r3"

                       :manager :m3}

              :t1-pk4 {:pk      :t1-pk4

                       :name    "t1-r4"

                       :manager :m2}})
(def t1-manager-index {:m1 #{:t1-pk1}

                       :m2 #{:t1-pk2 

                             :t1-pk4}

                       :m3 #{:t1-pk3}})      
(->> :m2

     t1-manager-index

     (select-keys table01 ,)) 
 
; =>

; {:t1-pk2 {:pk :t1-pk2, :name "t1-r2", :manager :m2},

; :t1-pk4 {:pk :t1-pk4, :name "t1-r4", :manager :m2}}

(->> [:m2 :m3]

     (select-keys t1-manager-index ,)

     vals

     (apply clojure.set/union ,)

     (select-keys table01 ,)) 
 
; =>

; {:t1-pk2 {:pk :t1-pk2, :name "t1-r2", :manager :m2},

; :t1-pk4 {:pk :t1-pk4, :name "t1-r4", :manager :m2},

; :t1-pk3 {:pk :t1-pk3, :name "t1-r3", :manager :m3}}

```

Your point of view is mainly to emphasize that Clojure is a multi-paradigm, general-purpose functional programming language.

The postgresql development team is also this view, so postgresql is not only RMDB (relational modeling), but also supports OO and json (NoSQL). But postgresql default data modeling is relational modeling

My point of view is mainly to emphasize the best practices of data modeling and programming.

Both views are correct and can exist in parallel.