-
Notifications
You must be signed in to change notification settings - Fork 24
Examples
Let's say we had a dataset that represents client page loads and has the fields: session_id, page, browser, country and load time. with sybil, you could run all the following queries:
- make a time series of the avg (or median) page load time (grouped by browser or country)
- show a table of visitor counts by browser and country
- show the distribution (and percentiles) of page load times grouped by country
- show how many unique visitors are visiting my site per hour, day, week, etc
We can also log every website that our browser visits into sybil, to get a more detailed and searchable browser history. By importing data out of chrome, we can learn things like:
- how many sites do we visit per day
- what is our frequency and usage of a particular site (by visits or time spent)
- how long do we spend in our browser per day
- what are our most visited websites?
Aside from instrumenting our own web browsing, its possible to instrument all (or some) of the actions in a browser on a web property we own. For example, if we own foo.com, its trivial to add a tracking script that sends back the following:
- arbitrary user actions, including all clicks and their related metainfo
- page performance stats (load time, etc)
- time spent including focused vs. idle time
- track and view navigation between pages
- etc
Most web properties do this (via GA or others) and perhaps more - like including your demographics, the other websites you visit and other advertiser relevant information. It is unfortunate and privacy invading, but its the world we live in. Use a chrome extension to block it, but it's unlikely you can block the website owner from tracking you (if they write their own custom scripts) without disabling javascript. The extension will hopefully block 3rd party sites, though.
Another fun thing to log is the process stats for long term analysis. This is the equivalent of running something like top for historical analysis. Using pidstats (which scans the /proc/ dir), we can log the pid, args, commands, memory size, cpu %, etc. This will let us run queries like:
- how much memory do all my node server process uses over time?
- which process consumes the most CPU over time?
- which processes fight for CPU?
- is there any process that spins up overnight and does lots of work when I'm not looking?
Logging all this info + dynamic queries is giving us the ability to zoom into our CPU and mem graph on a per process basis. It's amazing what you can see when you start digging through these graphs.
Sybil is not limited to only web datasets - it's useful for modeling and understanding any scenario that involves multi-dimensional data , for example: advertising impressions and funnels, content popularity, ops monitoring, site performance, bug tracking, IOT, hardware sensors and more . As long as your data can be modeled as events in time, it's possible to store and analyze through sybil