You should ask this to statisticians, because that's fairly high level statistics compared to what is common in linguistics, and although some linguists might know the answer you probably won't find them here.
However, I do know a bit about statistics, and I would just ask why she is trying to do it that way. What is the point in quantifying variation in the responses overall? Wouldn't it be more useful to look for particular patterns?
Best practice for significance testing is to have a single hypothesis in mind and then test for exactly that hypothesis, rather than fishing for any sort of (probably coincidental) patterns in the data. You're more likely to find noise (coincidences) if you look at the data too broadly or let a computer find patterns for you.
There are some textbooks (and other resources) specifically for how to do statistics in linguistics. But first you need to figure out what question you're asking. Then you can figure out how to test it statistically.
I'm not sure what kind of background either of you has, but if you're not used to significance testing in general (for example, a T-test, an ANOVA, etc.) then you probably should start with an introductory class (or equivalent, even if that's reading a textbook or just Wikipedia on your own). Obviously you can find specific information in research papers with a similar methodology to the project. That's a fairly safe way to do it.
One approach linguists often use is Mixed Effects models, where you have one main target variable (for example, the pronunciation of a certain sound) but you can include in your model the other ways in which the individuals in your sample vary, in order to avoid any correlations in that data causing problems. (For example, if you have boys and girls, but the boys are ages 4-8 and the girls are ages 6-10, you could try to balance that out using a more complex model. Mixed Effects is sort of like that (but read about the details beyond my oversimplified example here!), and it allows you to build a complex model).
My best advice for you would honestly be to start over entirely (you can keep the data!) and find a much simpler specific question (or several questions) to test statistically.