When I change the l1_cache_latency in gpgpusim.config file, total number of instruction changes (i.e., gpu_tot_sim_insn).
I used simulator from dev branch compiled with CUDA 9.1, G++ 5.4.0, Ubuntu 16.04, and used cudaTensorCoreGemm application of NVIDIA SDK 9.1.
Is that correct result?