-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Hey,
I was using your awesome clickstream algorithm engine when I noticed something interesting.
Here is what I did:
I am trying to verify results of the algorithm, so the check I do is the following:
- After running algorithm, open result.json file.
- For all leaf nodes in result.json, find list of exclusions for example:
["t", [["l", [48, 167, 201, 283, 434, 468, 672, 883, 916, 970, 1015, 1271],
{"exclusionsScore": [1285.0, 336.0208333333333, 0.0, 0.0], "exclusions": ["S2319", "S674", "S3690", "S3361"]}],
- To verify results I do a lookup for all users in this cluster (for instance userId 48) against their respective input file (input file contains actual log of actions performed by users which is used as input to algorithm) to verify that they actually have done at least one of ["S2319", "S674", "S3690", "S3361"] sequences.
Here are the results:
I found that I when do verify results - about 20% of users do not have any of the cluster sequences in the input file, meaning they did not perform any of the sequences of actions of the cluster they belong to.
Here is what I expected:
Does this result make sense? Shouldn’t users perform at least 1 sequence that appears in cluster they belong to?
Thank you very much
Metadata
Metadata
Assignees
Labels
No labels