| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ilophu 2135 days ago

Going to plug a couple of relevant things here.

- A book I saw recommended here called "The Sciences of the Artificial," which talks about the purpose and practice of modeling with computers.

- An old post of mine, where I wrote that "creating knowledge is a philosophical act that businesses mostly didn't realize they were getting into when they got on the data science bandwagon."

- A post by HN user "wenc," a practicing data scientist. I'm going to copy-paste the whole thing because I think it's that good and relevant:

---

Data science is correctly valued when you realize how relatively unimportant it is. It is a small cog in a larger machinery (or at least it ought to be). You see, decision-making involves (1) getting data, (2) summarizing and predicting, and (3) taking action. Continuous decision-making -- the kind that leads to impact -- involves doing this repeatedly in a principled fashion, which means creating a system around the decision process. For systems thinkers, this is analogous to a feedback control loop which includes sensor measurements + filters, controllers and actuators. (1) involves programmers/data engineers who have to create/manage/monitor data pipelines (that often break). This the sensor + filters part, which is ~40% of the system. (2) involves data scientists creating a model that guides the decision-making process. This is the model of the controller (not even the controller itself!), which is ~20% of the system. Having the right model is great, but as most control engineers will tell you, even having the wrong model is not as terrible as most people think because the feedback loop is self-correcting. A good-enough model is all you need. (3) involves business/front-line peoplewho actually implement decisions in real-life. This is where impact is delivered. ~40% of the system. This is the controller + actuator part, which makes the decisions and carries them out. Most data scientists think their value is in creating the most accurate model possible in Jupyter. This is nice, but in real-life not really that critical because the feedback-loop inherently moderates the error when deployed in a complex, stochastic environment. The right level of optimization would be to optimize the entire decision-making control feedback loop instead of just the small part that is "data science". p.s. data scientists who have particularly low-impact are those who focus on producing once-off reports (like consultant reports). Reports are rarely read, and often forgotten. Real impact comes from continuous decision-making and implementing actions with feedback. Source: practicing data scientist