Hello.
My wife is writting her Master's thesis and needs to perform cluster analysis on her data. The data basically is a list of questionnaire responses. There are 7 questions, and each question may be answerd with discrete values from 0 to 4. In other words, there will be one 7D vector per questionnaire participant, and each dimension of the vector may be a natural number between 0 and 4. The distance metric between two vectors is the following:
d(v1, v2) = Sumi in [0, 6][v1(i) == v2(i) ? 0 : 1] -- putting it simply, 1 unit per differing dimension.
I am a computer scientist and I am familiar with clustering methods, but I am not sure what kind of cluster analysis is commonly used in linguistics for this kind of data. What method would you recommend?
Thanks for any comment.