Add small compute examples illustrating new WGSL primitives for AI

@beaufortfrancois requested that some small examples be published here showing how to use the new WGSL primitives aimed at AI/ML workloads: `shader-f16`, DP4A, and soon, subgroups. Could we consider this?

Not sure what would be the most compelling - perhaps something with some visual output, and running a microbenchmark against the fallback WGSL code, assuming the feature is actually supported?