Hacker News new | ask | show | jobs
by gchallen 1259 days ago
Working in a top-tier computer science department, I find our ability to answer basic questions about the health of our degree program fairly troubling. I think non-academics may be surprised by how much we don't know, and how little useful and continuous data analysis is taking place.

For example: What is our retention rate? Meaning, what percentage of students who start our degree programs complete it. A fairly standard and important indicator of program health. Next, break this down by various cohorts: What is our retention rate among women? And so on. Heck, frequently we can't even answer questions about the current gender ratio within our program—and this is something that has been a focus of our diversity efforts recently.

I've had people say with a straight face that we _cannot_ calculate retention because we don't know when students leave our program. But of course someone knows this! And I've been able to produce rough estimates even given the limited data that I have access to. But a lot of educational data is fairly siloed, and frequently the people assigned to perform these tasks don't have much training and tend to give up quickly.

I suspect that many departments just don't have anyone assigned to do even basic educational data analysis on a regular basis, and with access to enough data to run interesting reports. My department is in the process of creating a faculty leadership role around academic data analytics, but my sense is that this will be a very unusual position. (And don't worry—it'll be filled by a faculty member, and not a new administrator.)

And don't even get me started about student evaluations of teaching. Yes, we give a survey at the end of every semester and ask students whether they liked a particular course and professor. No, those answers have very little to do with how much they actually learned. Yes, we could measure learning in other better ways—success in downstream courses, for example. No, people don't tend to do that.

There's a lot of room for improvement here, just working with the data we already have. No need for additional "telemetric signals".

5 comments

OK, then what?

Then you find out why retention is low.

Then you brainstorm ideas to increase retention.

Then you attempt to apply those ideas. It is at this point that the person responsible for applying those ideas says "been there done that".

Point is, most data dashboards are non actionable. The challenge is to create a good actionable dashboard (i.e. if values cross a certain threshold, then the user should take some action on it).

Once you create an excellent actionable dashboard, you realize it doesn't need a dashboard. It can be a notification.

So, while the data is important, the questions around it might just lead to the same work that was being done anyway.

In my experience it is generally worse than that. If a program has a retention rate of, say, 30% then it isn't like that is going to come as a surprise. Data generally reveals things that are very obvious. It is easier to read a situation by talking to a few people and asking simple questions. Which raises further questions about why a data-driven approach is needed.

The value in data driven approaches is high, but it takes a rare person to figure out why. Traditionally data has actually been a communication tool for things that are already known. That isn't at all how people expect it to be used, everyone seems to anticipate it is used to make decisions.

An organisation resisting data is bad news because it will struggle to talk about things that everyone knows to be true.

> The challenge is to create a good actionable dashboard

Huh? You don’t need a dashboard to make use of data. The best use of data in my mind is asking and answering questions.

Eg, maybe your program has a low number of graduating female engineers. Why? Maybe they’re dropping out along the way. Maybe female intake is low. With the data you can answer these questions.

The graduating rate of female CS students is low, but is it abnormally low compared to other schools? You investigate and - everyone has an equally low rate except one place where it’s 50/50. The data has led to a question - Why? What are they doing differently? And so on.

> Point is, most data dashboards are non actionable.

Maybe you find out that one specific class or set of classes is responsible for a lot of people leaving the program. So you zero in on that part of the curriculum and improve it. Sounds pretty actionable to me. We've actually done similar things on a smaller scale (week by week) to improve student success in my course.

But of course you don't know anything until you do the analysis.

Many times you need the data/visuals to get funding to actually implement ideas.
An anecdote from someone in IT at a major university: The registrar has two employees whose sole job is to write SQL queries. These are to answer basic questions such as, "How many undergraduates are currently enrolled." And it turns out that this is a nearly impossible question to which to give a definitive answer.
Shouldn't the finance folks know that kind of thing pretty definitively?
What does it mean to be enrolled? What if someone takes a semester off? What if they are part time? What if they are on a foreign exchange program this year? What if ...

The above are the type of questions that can result in different definitions of enrolled.

Perhaps one problem is that every college has its own bespoke curriculum and processes, so every data problem is a "little data" problem. Of course there are lots of rationalizations for why every program needs to be unique and special, but does it really benefit the students?

A similar problem in medicine: Every clinic system has a unique set of business processes, and a custom build of Epic. Granted the clinics are competing on which one can develop the most efficient processes, but does the patient benefit?

> Of course there are lots of rationalizations for why every program needs to be unique and special, but does it really benefit the students?

Of course not. What would benefit the students would be having a lot more standardization so that we compare ideas and approaches and determine what works. But the problem with standardized evaluation is that half of the programs suddenly discover that they aren't in the top half—as most of them had previously thought. This seems like more or less what happened to standardized testing in K–12 education.

But, in the context of a specific curriculum, having no idea what is happening and therefore no way to improve your bespoke curriculum is even worse than just deciding to do things your own way.

And it's worse than medicine, because at least they have some common metrics for what it means to be healthy. Whereas, faculty get to assign grades however they want! Imagine you ran a diet study where you both controlled the meals and got to reposition the numbers on the scale at will.

Hazarding an uninformed idea: If organizationally the department isn't geared toward this kind of analysis, what if you opened up the data, and let whoever wanted to work with it do their thing. Probably a perfect environment for crowd-sourcing, given the expertise and resources.
The most interesting data is usually about people. This data usually shouldn’t be shared for obvious privacy reasons.

And anonymising data sources is a notoriously fraught task.

We don't ask questions we don't want the answers to (be public).