Open
Description
Show advanced workflow with collection data types (arrays, maps, structs) and the lambda functions (reduce/transform/filter)
Examples from @blueedgenick (thank you!)
- A dating site : full 3-stage sequence to identify "activated matches": User A messages user B, User B responds to user A, User A responds to user B
This produces the bare minimum in the output (the id-pairs which meet the 'activated' criteria) but you can copy the reduce block from the having clause into the projection as well if you want to see what it's doing. This version should even be resilient to out-of-order data due to the sort=true flag passed to the as_map function.
select
array_sort(array [send_id, recv_id]) as sorted_send_recv_pair
from message_stream
group by array_sort(array [send_id, recv_id])
having reduce(
entries(
as_map(
collect_list(cast(rowtime as string)),
collect_list(send_id)
),
true
),
struct(state := 'none', sender := -1),
(s, e) = > case
when s->state = 'none'
then struct(state := 'initial', sender := e->v)
when s->state = 'initial' and s->sender != e->v
then struct(state := 'replied', sender := e->v)
when s->state = 'replied' and s->sender != e->v
then struct(state := 'activated', sender := e->v)
else s
end
)->state = 'activated' emit changes;
- To get the pageid with the top number of views per 60s window.:
create table foo as
select 'dummy',
reduce(
histogram(cast(USER_ID as string)),
struct(user := 'no result', views := cast(0 as bigint)),
(s, k, v) => case
when v > s->views
then struct(user := k, views := v)
else s
end) as top_user
from pageviews
window tumbling (size 60 seconds)
group by 'dummy'
emit changes;