| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by weavejester 2617 days ago

"arbitrary layering and deep nesting are not good engineering practices"

Perhaps, but that's irrelevant; I'm describing the difference in how hierarchical and relational models are designed.

"I prefer to use hash-map as the table with the primary key hash index, with key as the primary key and val(colname-colval-hashmap) as the row content."

And what if you need a second index? Your indexing should be separate from your data model, otherwise you can't write performant relational algebra operations that apply in the general case.

"It can also be implemented very elegantly with clojure.core. using hash-map operation is simpler, clearer, smoother and high performance."

No it can't. Suppose I have a relation with keys: a, b, c, d and e. I want to index on a, b and the pair (c, d). How would I do that in Clojure? What happens if I later decide I also want to index on e?

This is the sort of problem that's trivial to solve in a relational database, and extremely hard in Clojure, because Clojure doesn't have the functions or data structures to support data modelled in this way.

That's not to say that Clojure can't have these tools; just that they aren't built into clojure.core, because that's not what it's designed for.

"set, vector, list is generally not a good default data container, only used when needed"

Yes they are. Sets are the basis of relational algebra.

You're complecting the ideas of data representation with data indexing. Sets are a good representation of a relation, but a poor index.

We can get the best of both worlds by combining the two:

    (def a
      (let [r1 {:a/id 1, :a/name "a1", :b/id 1}
            r2 {:a/id 2, :a/name "a2", :b/id 1}]
        {:relation #{r1 r2}
         :index
         {:a/id   {1 #{r1}, 2 #{r2}}
          :a/name {"a1" #{r1}, "a2" #{r2}}
          :b/id   {1 #{r1 r2}}}}))

A data structure like this allows us to start writing efficient relational algebra. For example, with a natural join we can look for the smallest index two relations have in common.

So we can begin to construct the infrastructure we need to perform relational algebra in Clojure, but it's not there to begin with, and therefore Clojure isn't designed around the relational model.

1 comments

lincpa 2617 days ago

If you want to build an RMDB with Clojure, you are right, but this kind of project is very rare, including Datomic, which is built on top of RMDB, I don't think it needs to deal with such low-level and general operations.

Implementing an RMDB is not equivalent to relational algebraic operations. I think it should be to use simple, direct and lightweight the relational logic model to solve real-world problems. Don't over-optimize, over-generalize and over-complicate, keep it simple and direct.

Just like we design a Database in RMDB to solve real-world projects, This is the normal way to use relational algebra and models. After the database design is complete, we don't need care how the index is implemented, and we don't need care the underlying storage of the data. I mean, clojure is used as RMDB , is not used as tool of construct RMDB .

Therefore, my method is to design the application-level data model. The problem you said does not exist. When I get the data from the database (or elsewhere), I simply transform the data to the target model, you can think of this model as a table or view, you don't need to transform again, so you don't need multiple indexes.

link

weavejester 2616 days ago

I'm not talking about building a database; I'm talking about using a relational model for data.

Like most general-purpose programming languages, Clojure has a hierarchical data model. We have a number of collection types, and we can put any collection into any other collection.

A relational model takes a fundamentally different approach. Relational data is represented not by nesting collections, but by a flat set of tuples. Efficiency is achieved through indexing, not by rearranging collections.

There's some interesting research around on using the relational model outside of a database, but that's not a design goal of Clojure.

link

lincpa 2616 days ago

My method can also generate a "derived index" based on the "primary key hash index",it's Higher performance, not by rearranging collections.

```clojure

(def table01 {:t1-pk1 {:pk :t1-pk1

                       :name    "t1-r1"

                       :manager :m1}

              :t1-pk2 {:pk      :t1-pk2

                       :name    "t1-r2"

                       :manager :m2}

              :t1-pk3 {:pk      :t1-pk3

                       :name    "t1-r3"

                       :manager :m3}

              :t1-pk4 {:pk      :t1-pk4

                       :name    "t1-r4"

                       :manager :m2}})

(def t1-manager-index {:m1 #{:t1-pk1}

                       :m2 #{:t1-pk2 

                             :t1-pk4}

                       :m3 #{:t1-pk3}})

(->> :m2

     t1-manager-index

     (select-keys table01 ,))

; =>

; {:t1-pk2 {:pk :t1-pk2, :name "t1-r2", :manager :m2},

; :t1-pk4 {:pk :t1-pk4, :name "t1-r4", :manager :m2}}

(->> [:m2 :m3]

     (select-keys t1-manager-index ,)

     vals

     (apply clojure.set/union ,)

     (select-keys table01 ,))

; =>

; {:t1-pk2 {:pk :t1-pk2, :name "t1-r2", :manager :m2},

; :t1-pk4 {:pk :t1-pk4, :name "t1-r4", :manager :m2},

; :t1-pk3 {:pk :t1-pk3, :name "t1-r3", :manager :m3}}

```

Your point of view is mainly to emphasize that Clojure is a multi-paradigm, general-purpose functional programming language.

The postgresql development team is also this view, so postgresql is not only RMDB (relational modeling), but also supports OO and json (NoSQL). But postgresql default data modeling is relational modeling

My point of view is mainly to emphasize the best practices of data modeling and programming.

Both views are correct and can exist in parallel.

link

weavejester 2616 days ago

"Your point of view is mainly to emphasize that Clojure is a multi-paradigm, general-purpose functional programming language."

No, that's not my point at all. I'm saying that Clojure's core library and data structures are built around a hierarchical data model and not a relational one.

If you want to model your data as a relation, then you need to build the tools and structures for it. Look at your code, then consider how it would look in a language designed around relational algebra:

    (def table01
      #rel [{name "t1-r1", manager :m1}
            {name "t1-r2", manager :m2}
            {name "t1-r3", manager :m3}
            {name "t1-r4", manager :m2}])

    (select table01 (= manager :m2))

    ; => #rel [{name "t1-r2", manager :m2}
    ;          {name "t1-r4", manager :m2}] 

    (select table01 (or (= manager :m2) (= manager :m3)))

    ; => #rel [{name "t1-r2", manager :m2}
    ;          {name "t1-r3", manager :m3}
    ;          {name "t1-r4", manager :m2]

An indexed selection against a relation would just be a single function or macro. We wouldn't need to mess around with select-keys and set union to achieve such a simple operation, as you needed to do in your code. It would be built into the core library or the language.

I want to emphasize that I'm not saying you shouldn't model data the way you are. There are plenty of advantages to it. But the more you go down the relational rabbit hole, the less suitable clojure.core is to handle it.

Clojure is about data modelling and processing, but it isn't based on a relational model, as you suggest:

"Clojure is a functional programming language based on relational database theory"

Clojure is a functional programming language, but it's not based on relational database theory.

link

lincpa 2615 days ago

If Clojure is the sea, programming is to sail on the sea, I use the relational model as a lighthouse and route, as a reference model for simple programming, because the relationship theory is simple and scientific..

I had implemented a DataFrame with hash-map, which implements relational operations. A relational operation is just a function. The advantage of hash-map is that key-chain can be used as a pointer, and processing data elements is simple and efficient. Therefore, DataFrame has advantages of RMDB and NoSQL.

A strict relational model will lose the flexibility of data element operations.

link