|
|
|
|
|
by sktrdie
2576 days ago
|
|
Not sure who this document is aimed to. It's not technical enough to appeal to programmers that are working closely with Pilosa. And it's not written in a way to make it easy to understand for people that don't know anything about Pilosa (such as myself). I mean a subtitle called "Time Quantum" is enough to make me confused. Would appreciate a more generic "what is this" intro if possible. |
|
> Pilosa is a distributed bitmap index that sits on top of a data store. The key to understanding and then using Pilosa is converting data such that it is represented in ones and zeros. This dramatically reduces the size as well as accelerates query times.
> For example, timestamps are important information, but we tend to be interested in individual components of a timestamp, especially when analyzing data with cyclic trends. Timestamp components are stored as groups of bitmaps, known as “frames”. We create one frame for the day of the week, as illustrated in the following table. Along with similar frames for year, month, and time of day, this accelerates queries that ask questions about rides belonging to any logical combination of these time groups.
> [...]
> Because each data point includes pickup/dropoff times and total distance travelled, it’s easy to determine the average speed of the trip. As an example, we use this as a first order approximation of congestion. We created a frame representing average speed, with a spacing of 1 mph.
> In order to answer questions about congestion, we needed to first determine what speeds constitute slow traffic. One of the basic queries in Pilosa is the TopN function, and we used that to get a list of all the different average speeds. By performing a count on each we built a histogram of how many rides fall into each speed bucket, and decided from there which buckets deviate enough from the norm to constitute congestion.