I have a passion project 4x4anarchy.com that operates with a Python-MariaDB system for querying map data by latitude and longitude, transforming it into GeoJSON for map display. The website deals with sizable tables, approximately 1 GB in size. I've made extensive optimizations, relying on well-structured indexes, caching mechanisms, and query optimization to enhance performance.
Given these circumstances, how might the incorporation of Julia and some geospatial DB (PostGIS) contribute to further optimizing geospatial data retrieval and presentation, especially when dealing with large datasets and intricate geospatial operations?
It would depend on where most of the processing is happening.
PostGIS gives you the benefit of spatial indexes which are extremely performant.
I've seen Python GeoSpatial applications taking hours to finish processing which only took a few minutes when shifted onto PostGIS.
If you're also doing a lot of processing in Python, exploring other languages could also help. In the case of Julia you get a typed language that's also JIT compiled.
I think that the challenge for most is that the PostGIS query planner does the indexing for you in most queries, while a naive all-pairs comparison in geopandas/shapely won't tell you to use the .sindex attribute instead.
I don't know Julia well, but I definitely would suggest exploring whether PostGIS can help improve the speed of your DB queries.
I'd also consider how you deliver your geospatial data to your clients -- I'm not sure GeoJSON is your best bet. Protobuf tiles might be better for your use-case (e.g. the Mapbox Vector Tiles spec).
Be mindful that most of julia's geometry code is a wrapper of libGEOS (C version) and libGDAL, that means that you can't easy extend the algorithms, everythig is behind a black box on the C side. Source: I have worked in the field last year, I have a small patch in LibGEOS.jl .
So in other Julia geometry-related projects that may be true, but for this particular corner of the ecosystem the main author (Júlio Hoffimann) has actually implemented much of the underlying geometry and other code from scratch (to the best of my understanding) in pure Julia in a whole set of packages, including e.g.
Scanning the site see mostly points algorithms, the only mention of polygons is a textbook LibGEOS call, I see no network at all. And I see no smart manipulation of anything else than points, I see no subdivision of space, etc.
I have worked with it. It was just stating, very little useful code in it. Going back to the source code, I see they added a bit more.
A quick look around suggest that only one algorithm uses an indexing structure. Clipping seems limited between a convex polygon and a concave one.
The book is quite interesting, but it does seem like a lot of the underlying work is farmed out to GeoStats.jl, which doesn't really seem to use the same vocabulary I'd expect in other languages using PostGIS or Geopandas etc. For example, I don't see many mentions of Polygons or MultiPolygons when I search. However, I do find this page[1] which seems to define similar(?) equivalents. Can I expect equivalent geospatial joins/queries to be available? I don't see many mentions of the types that I would normally do, especially overlay operations[2].
Julia is designed to seem to win arguments as best I can tell... If you complain about the need to break abstractions and the lack of general purpose application you're accused of not understanding. When you say it slow they say you can inline assembler, and when you say that's dumb why have a high level language then, they then say well you don't have to it is fast as is and everyone else is slow, and it just devolves into circular arguments. Abstractions exist in layers for reasons.
You can obviously provide the same abstraction with different implementations that yield different performance characteristics. Julia provides the same level of flexibility (if not more) as Python without any of the design decisions which cause Python to be so slow. I fail to see how this is a contentious point.
when you say Julia is slow, what are you talking about? even without any fancy tricks, normal Julia code is usually the same speed as the equivalent normal C code
For the same reason that C/C++ allow inline assembly? Languages come in roughly 3 speeds. Slow (e.g. python/R), mostly not slow (e.g. Java/Go), and not slow (e.g. C/Rust). If you want actually fast code (e.g. the speed of BLAS/FFTW etc) you need the combination of a not slow language, code generation, and often hand-coded assembly for the most performance critical parts.
I used to think so, but I have a function that gets called about a billion times each and every day as new data comes in, and and takes about 0.01 seconds to evaluate (optimizaiton with nlopt). I tried to code it in c (30% speed improvement) python (twice as slow), Julia (about the same speed). Reason is that call has 5 parameters that operate on a vector of length 50 to return a value to minimize. Turns out R is pretty good at such vector calculations.
I think that is exactly what is happening. Most of my code is much much faster in Julia, and the code is nicer. But R has its moments. Which is good since this particular app has 3K lines, and I do not want to port it to Julia.
And data.tables in R is faster (and I think nicer to write) than DataFrames in Julia. And since data.tables feed my optimization, R still wins.
R can exploit parallel hardware just fine with Parallel, Future and other libraries like Mirai. The problem is that execution speed is going to be a bottleneck for anything large and when you reach some optimizations, maybe R is not the best language to do the job. But it depends a lot on the use case.
I much prefer parallel in R with mclapply() to the Julia implementation of parallel. One of the few areas where I prefer R to julia (other being R data.tables to julia dataframes)
Geospatial Data Science with Julia presents a fresh approach to data science with geospatial data and the Julia programming language. It contains best practices for writing clean, readable and performant code in geoscientific applications involving sophisticated representations of the (sub)surface of the Earth such as unstructured meshes made of 2D and 3D geometries.
Given these circumstances, how might the incorporation of Julia and some geospatial DB (PostGIS) contribute to further optimizing geospatial data retrieval and presentation, especially when dealing with large datasets and intricate geospatial operations?