Hacker News new | ask | show | jobs
by weavejester 2617 days ago
While Postgres supports hierarchical data in the form of JSON, that doesn't mean it's a form of relational data, it just means Postgres supports both.

Clojure doesn't have good tools in its core library for working with relational data. There's no core type that explicitly represents a relation, and Clojure lacks functions for many basic relational algebra operations. For example, how would you perform a natural join across your data structure?

1 comments

In programming, data modeling is the most important.

Convert(or design) data to hash-map, join (or merge) by key.

it can write commonly used operations as functions, try to row (or col) operations as much as possible, and join all data only when necessary(reduce the row-join).

```clojure

(def a {:a-id-01 {:a-name "a1"}

        :a-id-02 {:a-name "a2"}})
(def b {:b-id-01 {:a-id :a-id-01 :b-name "b2"}

        :b-id-02 {:a-id :a-id-02 :b-name "b2"}})
(->> b

     :b-id-01

     :a-id

     a

     :a-name)
;=>

;"a1"

(let [x (b :b-id-01)]

   (->> x  

        :a-id

        a

        (merge x ,)))
;=>

;{:a-id :a-id-01,

; :b-name "b2",

; :a-name "a1"}

```

You're describing a language whose data model is hierarchical, not relational. In your code you first convert a relation into a hierarchy of collections. Why would you need to do that if Clojure used a relational data model?
In RMDB, database data model (db,table, row, col, value[array,json, int, text, etc.] ) is hierarchical.

Relation is a logical model mapping of data structures, it is just a logical thinking that exists in the brain.

SQL, Prolog, clojure.core, minikanren can be used for relational operations.

The relational model isn't arbitrarily hierarchical. We can place a map inside a map, or a map inside a vector, and keep going indefinitely until we run out of memory. Conversely, in a relational database we can't place a database into a table, or a table into a row.

It's true that we can represent relations using collections. In Clojure we'd write:

    (def a
      #{{:a/id 1, :a/name "a1"}
        {:a/id 2, :a/name "a2"}}

    (def b
      #{{:b/id 1, :b/name "b1", :a/id 2}}
But these structures don't allow for efficient lookup or joins, and we lack inbuilt functions to easily deal with data modelled in this way.

Relational databases are based on relational algebra. If Clojure is based on relational databases, then we'd expect to be able to do relational algebra easily in Clojure. But we can't: the core library isn't designed for it, and the built-in data structures aren't designed for it.

In general, I think:

1. arbitrary layering and deep nesting are not good engineering practices.

2. refer to the data-model & code of my latest two posts. I prefer to use hash-map as the table with the primary key hash index, with key as the primary key and val(colname-colval-hashmap) as the row content.

I also don't think relational algebra operations must be implemented in the form of RMDB and SQL. It can also be implemented very elegantly with clojure.core. using hash-map operation is simpler, clearer, smoother and high performance.

There are many ways to implement relational algebra. The thinking is not limited by the "information structure" displayed by the traditional RMDB interface. In clojure, the hash-map(NoSQL) is the underlying physical model, and the relational model is the upper logical model.clojure core function acts as a data manipulation language, I named this architecture SuperSQL or SuperRMDB.

In fact, the original data manipulation language of posgresql and foxpro is not SQL. Clojure core function is closer to foxpro's commands (DML).

3. set, vector, list is generally not a good default data container, only used when needed. you use the set as container, it's difficult to operate data (table, row, column, value).

4. In summary, I think: programming is the process of designing a data model that is simple and fluent in manipulation. To have open thinking, not to be restricted by traditional thinking, to be flexible, adapt to local conditions, and design as needed.

"arbitrary layering and deep nesting are not good engineering practices"

Perhaps, but that's irrelevant; I'm describing the difference in how hierarchical and relational models are designed.

"I prefer to use hash-map as the table with the primary key hash index, with key as the primary key and val(colname-colval-hashmap) as the row content."

And what if you need a second index? Your indexing should be separate from your data model, otherwise you can't write performant relational algebra operations that apply in the general case.

"It can also be implemented very elegantly with clojure.core. using hash-map operation is simpler, clearer, smoother and high performance."

No it can't. Suppose I have a relation with keys: a, b, c, d and e. I want to index on a, b and the pair (c, d). How would I do that in Clojure? What happens if I later decide I also want to index on e?

This is the sort of problem that's trivial to solve in a relational database, and extremely hard in Clojure, because Clojure doesn't have the functions or data structures to support data modelled in this way.

That's not to say that Clojure can't have these tools; just that they aren't built into clojure.core, because that's not what it's designed for.

"set, vector, list is generally not a good default data container, only used when needed"

Yes they are. Sets are the basis of relational algebra.

You're complecting the ideas of data representation with data indexing. Sets are a good representation of a relation, but a poor index.

We can get the best of both worlds by combining the two:

    (def a
      (let [r1 {:a/id 1, :a/name "a1", :b/id 1}
            r2 {:a/id 2, :a/name "a2", :b/id 1}]
        {:relation #{r1 r2}
         :index
         {:a/id   {1 #{r1}, 2 #{r2}}
          :a/name {"a1" #{r1}, "a2" #{r2}}
          :b/id   {1 #{r1 r2}}}}))
A data structure like this allows us to start writing efficient relational algebra. For example, with a natural join we can look for the smallest index two relations have in common.

So we can begin to construct the infrastructure we need to perform relational algebra in Clojure, but it's not there to begin with, and therefore Clojure isn't designed around the relational model.