Hi,
I have a dataset with 16000 observations and 68 variables. I want to measure similarity (correlation) and dissimilarity (L2squared distance) of each row of the data compared with the last row (userid=98) which is my reference row. My goal is to have two new variables, (similarity and dissimilarity ) which show the similarity and dissimilarity of each user's profile compared to the reference row. I found this link, but I have never worked with Matrices in Stata and don't know how I should approach this problem. I really appreciate your help.
Here is a sample of my data:
I have a dataset with 16000 observations and 68 variables. I want to measure similarity (correlation) and dissimilarity (L2squared distance) of each row of the data compared with the last row (userid=98) which is my reference row. My goal is to have two new variables, (similarity and dissimilarity ) which show the similarity and dissimilarity of each user's profile compared to the reference row. I found this link, but I have never worked with Matrices in Stata and don't know how I should approach this problem. I really appreciate your help.
Here is a sample of my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long userid float(v1 v2 v3 v4) 24 3.12 2.5 2.88 3.12 40 3.4424145 2.8373425 3.281276 3.370227 51 4.12 3.12 3.88 4 67 4.12 3.12 4.88 3.88 76 3.685956 3.127154 3.620283 3.439283 84 3.352679 3.2907455 3.2907455 3.2711875 95 3.7548585 3.990283 3.757191 3.615336 97 2.88 2.88 3 3.12 98 3.235533 2.831092 3.1384676 2.9281576 end