|
|
|
|
|
by wodenokoto
2544 days ago
|
|
You are right that tidy data is in a different form that many supplied tables are. If I understand correctly, you want to know how many NA's there are in each column in a wide-form dataset (as opposed to a tidy dataset) # One line to make the data tidy.
# The form of data will be 3 columns: id, question, answer, and no, we don't care what the columns are called, except for id.
tidydf <- df %>% gather("question", "answer", -id)
# one line to do your check
tidydf %>% group_by(id) %>% summarise(n_NA = sum(is.na(answer)))
Tidyverse is highly opinionated about its data structure, and it is one of its limiting factors, as it basically treats every dataset as a sparse dataset. This actually fits very well with your data, as a datapoint is not a fixed questionnaire, but rather a datapoint is a respondents answer to a question (as questionnaires vary in questions, a tall table layout is quite fitting).From there on you have to think in groups and summaries, unless you wanna fight the library. Tidyverse is an 80% datascience solution. It solves what you need 80% of the time really, really well, and the last 20% you either have to fall back to base R or really torture dplyr. |
|