The total training loss consists of (1) cross-entropy loss and (2) total cost, which is the sum of cost from all layers: costtotal = Σlcostl. How to balance