Hacker News new | ask | show | jobs
by pfedak 974 days ago
As best I can tell from reading https://www.stat.purdue.edu/~yuzhu/stat514s2006/Lecnot/fracf... the unstated assumption here is that we're going to do linear regression where (critically) the "-" case for each condition is -1, and the "+" is +1. This has the surprising-to-me effect of making "I", which looks like it might be the control group based on notation, actually a positive recipient of AC interaction (and any even-order interaction). You can think of this as a change in basis in how you parse out the effects, where we're talking about

1 -1

-1 1

(like a covariance matrix) instead of

0 0

0 1

for an interaction.

I have a gut feeling it's done this way mostly because the tools being used expect things to be expressed this way rather than any conscious choice by experimenters. Through this lens, if you test

I, B, AC, ABC

every experiment has a positive effect from AC interaction, and taking B-I, which we might think of as the effect from B, is in this paradigm also sensitive to the ABC interaction and the AB and BC interactions. The "real" effect from B would be approximated as (B + ABC - AC - I)/2, which is exactly the same as the effect from ABC interaction (which is positive when an odd number of its constituents are positive...).

I'm pretty sure this is just a difference in mathematical perspective - you can represent exactly the same data, but the coefficients (i.e. effect values) will change, and there's a different notion of what you know vs don't know. Maybe there's a more convincing reason to do this when you have more than two "levels", but from the presentation in TFA it just feels like overcomplicating things with a confusing prior about how effects work.

It also seems like the given example is just bad. If the parameters are numeric and there's not a reasonable "control", this perspective feels much more natural.

1 comments

Thanks for the explanation and the link.

So the goal is to find real numbers f_I, f_A, ...f_ABC such that

result = f_I + f_A*v_A + f_B*v_B + f_AB*v_A*v_B + ... + f_ABC*v_A*v_B*v_C

where v_A is 1 or -1 depending on whether A is present or absent in the experiment. f_AB is being abbreviated to AB, which is causing some confusion, since when heading a column, AB means v_A*v_B. The article should say that we can't tell the difference between the effect associated with B and the effect associated the 3-way interaction (for this definition of the effect associated with the 3-way interaction).