|
|
|
|
|
by didgetmaster
1339 days ago
|
|
Interesting data set. I am building a new kind of data analysis tool (https://www.Didgets.com) so I am always looking for good open data sets to download, import into my tool, and see what the data shows and to test out my tool. I downloaded both CSV files (geometry and simulations) and built a couple relational tables with them in a few minutes. I am confused by a few things. There are 42,207 unique values in the 'apartment_id' column. The most common one is d41d8cd98f00b204e9800998ecf8427e which is referenced 1451 times. At first I thought that it might actually be some kind of 'plan_id' where the same plan was used to build multiple apartments (this id is associated with 13 different 'building_id' values) but drilling down to each one reveals some very different features. It is certainly possible that the same plan could be used with slight variations (e.g. one has a tub in the bathroom while another had a shower installed), but some of the features were very unique. For example there are 26 different KITCHEN areas associated with the id, but only 21 LIVING_DINING areas. My tool is great for finding and fixing anomalies in data sets if they exist. This one is a bit confusing about what some elements mean and the site doesn't explain them very well. If the same plan is being used across multiple buildings, it might be interesting to see how the amount of light entering the building differs based on if the same plan was used to build an apartment on the north side of a building vs the south side. |
|
(granted this is entirely without looking at the data) but my guess is that they MD5 hashed whatever was in that apartment_id column and if it was empty it spat out d41d8cd98f00b204e9800998ecf8427e