Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 73258

Measuring similarity and dissimilarity indices between each observation and the reference row

$
0
0
Hi,

I have a dataset with 16000 observations and 68 variables. I want to measure similarity (correlation) and dissimilarity (L2squared distance) of each row of the data compared with the last row (userid=98) which is my reference row. My goal is to have two new variables, (similarity and dissimilarity ) which show the similarity and dissimilarity of each user's profile compared to the reference row. I found this link, but I have never worked with Matrices in Stata and don't know how I should approach this problem. I really appreciate your help.

Here is a sample of my data:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long userid float(v1 v2 v3 v4)
24      3.12       2.5      2.88      3.12
40 3.4424145 2.8373425  3.281276  3.370227
51      4.12      3.12      3.88         4
67      4.12      3.12      4.88      3.88
76  3.685956  3.127154  3.620283  3.439283
84  3.352679 3.2907455 3.2907455 3.2711875
95 3.7548585  3.990283  3.757191  3.615336
97      2.88      2.88         3      3.12
98  3.235533  2.831092 3.1384676 2.9281576
end


Viewing all articles
Browse latest Browse all 73258

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>