neilotoole/errgroup is a drop-in alternative to Go's wonderful
sync/errgroup but
limited to N goroutines. This is useful for interaction with rate-limited
APIs, databases, and the like.
Note The
sync/errgrouppackage now has a Group.SetLimit method, which eliminates the need forneilotoole/errgroup. This package will no longer be maintained. Usesync/errgroupinstead.
In effect, neilotoole/errgroup is sync/errgroup but with a worker pool
of N goroutines. The exported API is identical but for an additional
function WithContextN, which allows the caller
to specify the maximum number of goroutines (numG) and the capacity
of the queue channel (qSize) used to hold work before it is picked
up by a worker goroutine. The zero Group and the Group returned
by WithContext have numG and qSize equal to runtime.NumCPU.
The exported API of this package mirrors the sync/errgroup package.
The only change needed is the import path of the package, from:
import (
  "golang.org/x/sync/errgroup"
)to
import (
  "github.com/neilotoole/errgroup"
)Then use in the normal manner. See the godoc for more.
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
    // do something
    return nil
})
err := g.Wait()Many users will have no need to tweak the numG and qCh params. However, benchmarking
may suggest particular values for your workload. For that you'll need WithContextN:
numG, qSize := 8, 4
g, ctx := errgroup.WithContextN(ctx, numG, qSize)The motivation for creating neilotoole/errgroup was to provide rate-limiting while
maintaining the lovely sync/errgroup semantics. Sacrificing some
performance vs sync/errgroup was assumed. However, benchmarking
suggests that this implementation can be more effective than sync/errgroup
when tuned for a specific workload.
Below is a selection of benchmark results. How to read this: a workload is X tasks of Y complexity. The workload is executed for:
sync/errgroup, listed assync_errgroup- a non-parallel implementation (
sequential) - various 
{numG, qSize}configurations ofneilotoole/errgroup, listed aserrgroupn_{numG}_{qSize} 
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_default_16_16-16         	   25574	     46867 ns/op	     688 B/op	      12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_4_4-16                   	   24908	     48926 ns/op	     592 B/op	      12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_16_4-16                  	   24895	     48313 ns/op	     592 B/op	      12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/errgroupn_32_4-16                  	   24853	     48284 ns/op	     592 B/op	      12 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/sync_errgroup-16                   	   18784	     65826 ns/op	    1858 B/op	      55 allocs/op
BenchmarkGroup_Short/complexity_5/tasks_50/sequential-16                      	   10000	    111483 ns/op	       0 B/op	       0 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_default_16_16-16        	    3745	    325993 ns/op	    1168 B/op	      27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_4_4-16                  	    5186	    227034 ns/op	    1072 B/op	      27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_16_4-16                 	    3970	    312816 ns/op	    1076 B/op	      27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/errgroupn_32_4-16                 	    3715	    320757 ns/op	    1073 B/op	      27 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/sync_errgroup-16                  	    2739	    432093 ns/op	    1862 B/op	      55 allocs/op
BenchmarkGroup_Short/complexity_20/tasks_50/sequential-16                     	    2306	    520947 ns/op	       0 B/op	       0 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_default_16_16-16       	     354	   3602666 ns/op	    1822 B/op	      47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_4_4-16                 	     420	   2468605 ns/op	    1712 B/op	      47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_16_4-16                	     334	   3581349 ns/op	    1716 B/op	      47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/errgroupn_32_4-16                	     310	   3890316 ns/op	    1712 B/op	      47 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/sync_errgroup-16                 	     253	   4740462 ns/op	    8303 B/op	     255 allocs/op
BenchmarkGroup_Short/complexity_40/tasks_250/sequential-16                    	     200	   5924693 ns/op	       0 B/op	       0 allocs/op
The overall impression is that neilotoole/errgroup can provide higher
throughput than sync/errgroup for these (CPU-intensive) workloads,
sometimes significantly so. As always, these benchmark results should
not be taken as gospel: your results may vary.
Why require an explicit qSize limit?
If the number of calls to Group.Go results in qCh becoming
full, the Go method will block until worker goroutines relieve qCh.
This behavior is in contrast to sync/errgroup's Go method, which doesn't block.
While neilotoole/errgroup aims to be as much of a behaviorally similar
"drop-in" alternative to sync/errgroup as possible, this blocking behavior
is a conscious deviation.
Noting that the capacity of qCh is controlled by qSize, it's probable an
alternative implementation could be built that uses a (growable) slice
acting - if qCh is full - as a buffer for functions passed to Go.
Consideration of this potential design led to this issue
regarding unlimited capacity channels, or perhaps better characterized
in this particular case as "growable capacity channels". If such a
feature existed in the language, it's possible that this implementation might
have taken advantage of it, at least in the first-pass release (benchmarking notwithstanding).
However benchmarking seems to suggest that a relatively
small qSize has performance benefits for some workloads, so it's possible
that the explicit qSize requirement is a better design choice regardless.