greatcodeNavigate back to the homepage

Concurrency is not Parallelism

Mike Christensen
October 27th, 2020 · 4 min read

In my previous post I discussed how the simple rules of the Boids algorithm, applied to independent actors aware only of their local environments, produces beautiful, realistic flocking behaviour. While writing I was reminded of another of my favourite examples of emergent behaviour: Conway’s Game of Life. For fun, I decided to implement it in Go - and found that it demonstrates well that old Go proverb:

Concurrency is not parallelism.

Conway’s Game of Life

In 1970, John Conway invented his “Game of Life” (GoL): a simple cellular automaton, from which emerges a curious life-like behaviour, almost as if one was looking through a microscope into a Petri dish.

Game of Life

The GoL universe is a matrix of cells, each of which is either alive (depicted black) or dead (depicted white). Each cell transitions between these states according to the following rules:

  1. Any live cell with two or three live neighbours survives.
  2. Any dead cell with three live neighbours becomes a live cell.
  3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.

A neighbour refers to any of the immediately adjacent cells.

Implementing GoL in Go

When approaching a new problem, always implement a simple solution first. As Donald Knuth famously said:

“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

With this in mind, let’s implement a super simple GoL algorithm in Go!

Modelling the World Grid

First we need a model of the world:

1// World describes the world grid.
2type World struct {
3 Width, Height int
4 // Grid describes the set of cells in the world and their alive/dead state.
5 // Note that the first row in the array corresponds to the bottom of the screen,
6 // and the first element of a row corresponds to the left side of the screen.
7 Grid [][]bool
8}

We can instantiate a World like so:

1// NewWorld returns a World of the given dimensions with a randomly initialised grid.
2func NewWorld(width, height int) *World {
3 world := &World{
4 Width: width,
5 Height: height,
6 }
7 world.Grid = make([][]bool, height)
8 for y := range world.Grid {
9 world.Grid[y] = make([]bool, width)
10 for x := range world.Grid[y] {
11 world.Grid[y][x] = rand.Float64() < 0.2
12 }
13 }
14 return world
15}

Counting Neighbours

In order to implement Conway’s rules, we will need to be able to count the number of alive neighbours to a given cell:

1// WrapCoords accepts an (x, y) coordinate and returns a coordinate which is
2// within the bounds of the World, using a toroidal wrapping.
3func (w *World) WrapCoords(x, y int) (int, int) {
4 if x < 0 {
5 x = w.Width - 1
6 }
7 if x > w.Width-1 {
8 x = 0
9 }
10 if y < 0 {
11 y = w.Height - 1
12 }
13 if y > w.Height-1 {
14 y = 0
15 }
16 return x, y
17}
18
19// CountLiveNeighbours returns the number of live neighbours
20// around the given cell position.
21func (w *World) CountLiveNeighbours(x, y int) int {
22 count := 0
23 for j := -1; j <= 1; j++ {
24 for i := -1; i <= 1; i++ {
25 if i == 0 && j == 0 {
26 continue
27 }
28 _x, _y := w.WrapCoords(x+i, y+j)
29 if w.Grid[_y][_x] {
30 count++
31 }
32 }
33 }
34 return count
35}

This simple implementation wraps the world grid such that flying off the right hand side brings you back to the left, and flying off the top brings you to the bottom - i.e. a torus:

Torus

Conway’s Rules

Now that we can count neighbours, we can implement Conway’s rules:

1// Update applies the rules to the world.
2func (w *World) Update() {
3 // Create a deep copy of the world grid
4 nextGrid := make([][]bool, len(w.Grid))
5 for y := range w.Grid {
6 nextGrid[y] = make([]bool, len(w.Grid[y]))
7 copy(nextGrid[y], w.Grid[y])
8 }
9
10 // Apply Conway's rules
11 for y := range w.Grid {
12 for x := range w.Grid[y] {
13 n := w.CountLiveNeighbours(x, y)
14 if w.Grid[y][x] {
15 // Any live cell with fewer than two or more than three live neighbours dies
16 if n < 2 || n > 3 {
17 nextGrid[y][x] = false
18 }
19 } else {
20 // Any dead cell with exactly three live neighbours becomes a live cell
21 if n == 3 {
22 nextGrid[y][x] = true
23 }
24 }
25 }
26 }
27
28 // Update the world grid
29 w.Grid = nextGrid
30}

Note how we create a copy of the world grid so that we don’t read the updated values of neighbouring cells.

Rendering to the Screen

Finally, it would be great to render the game to the screen. For this I used the excellent pixel library from @faiface. We can create a window and game loop in the same way as described in my previous post:

1const (
2 // WindowWidth is the width of the window in pixels
3 WindowWidth = 1200
4 // WindowHeight is the height of the window in pixels
5 WindowHeight = 800
6 // WorldWidth is the width of the world in # cells.
7 WorldWidth = 1200
8 // WorldHeight is the height of the world in # cells.
9 WorldHeight = 800
10)
11
12func createWindow(title string, width, height float64) (*pixelgl.Window, error) {
13 window, err := pixelgl.NewWindow(pixelgl.WindowConfig{
14 Title: title,
15 Bounds: pixel.R(0, 0, width, height),
16 VSync: true,
17 })
18 if err != nil {
19 return nil, errors.Wrap(err, "new window failed")
20 }
21 window.SetSmooth(true)
22 return window, nil
23}
24
25// Run runs the game.
26func Run() {
27 window, err := createWindow("GoLife", WindowWidth, WindowHeight)
28 if err != nil {
29 panic(errors.Wrap(err, "create window failed"))
30 }
31 frames := 0
32 second := time.Tick(time.Second)
33 world := life.NewWorld(WorldWidth, WorldHeight)
34 drawer := draw.NewDrawer()
35 for !window.Closed() {
36 drawer.DrawFrame(window, world)
37 world.Update()
38 window.Update()
39 frames++
40 select {
41 case <-second:
42 window.SetTitle(fmt.Sprintf("%s | FPS: %d", "GoLife", frames))
43 frames = 0
44 default:
45 }
46 }
47}
48
49func init() {
50 rand.Seed(time.Now().Unix())
51}

Then, we can implement DrawFrame to draw the world grid on the window. We can achieve this by iterating over the world grid and rendering black rectangles at the coordinates of alive cells, applying a scaling factor to convert the grid coordinates to window coordinates:

1// DrawFrame draws the world on the window.
2func (d *Drawer) DrawFrame(window *pixelgl.Window, world *life.World) error {
3 window.Clear(colornames.Aliceblue)
4 d.Batch.Clear()
5
6 imd := imdraw.New(nil)
7 imd.Color = pixel.RGB(0, 0, 0)
8 scale := pixel.V(
9 window.Bounds().Max.Y/float64(len(world.Grid)),
10 window.Bounds().Max.X/float64(len(world.Grid[0])),
11 )
12 for y := range world.Grid {
13 for x := range world.Grid[y] {
14 if !world.Grid[y][x] {
15 continue
16 }
17 imd.Push(pixel.V(float64(x)*scale.X, float64(y)*scale.Y))
18 imd.Push(pixel.V(float64(x+1)*scale.X, float64(y+1)*scale.Y))
19 imd.Rectangle(0)
20 }
21 }
22 imd.Draw(d.Batch)
23
24 d.Batch.Draw(window)
25 return nil
26}

Here’s the result with a 1200x800 world grid:

Game of Life

Faster!

Great, so our basic implementation works - but can we make it faster?

Benchmarking

First up, time for a speed test: let’s add some benchmarks. For the purposes of this post, we’ll ignore the rendering code and benchmark the core GoL update logic. If you are unfamiliar with writing benchmarks in the Go, I recommend this excellent post from Dave Cheney.

1func benchmarkUpdate(sideLength int, b *testing.B) {
2 world := life.NewWorld(sideLength, sideLength)
3 for n := 0; n < b.N; n++ {
4 world.Update()
5 }
6}
7
8func BenchmarkUpdate30(b *testing.B) { benchmarkUpdate(30, b) }
9func BenchmarkUpdate35(b *testing.B) { benchmarkUpdate(35, b) }
10func BenchmarkUpdate40(b *testing.B) { benchmarkUpdate(40, b) }
11func BenchmarkUpdate45(b *testing.B) { benchmarkUpdate(45, b) }

Here we are measuring the time taken to update different grid sizes. Running on my machine (with a 2m -benchtime to improve statistical significance):

1❯ go test -timeout=1h -benchtime=2m -bench=. ./pkg/life/life_test.go
2goos: darwin
3goarch: amd64
4BenchmarkUpdate30-4 2718184 53135 ns/op
5BenchmarkUpdate35-4 2132811 69771 ns/op
6BenchmarkUpdate40-4 1558998 101462 ns/op
7BenchmarkUpdate45-4 1000000 121130 ns/op
8BenchmarkUpdate50-4 319368 141794 ns/op
9BenchmarkUpdate55-4 226026 162587 ns/op
10BenchmarkUpdate60-4 190359 203564 ns/op
11BenchmarkUpdate65-4 136521 248646 ns/op
12PASS
13ok command-line-arguments 788.950s

As expected, we can see that runtime grows linearly with the number of cells in the grid (the square of the height of the grid):

Initial Bench

In order to compare the program performance as we optimise the code, let’s obtain a benchmark on a large grid of 10000x10000 cells:

1# func BenchmarkUpdate10000(b *testing.B) { benchmarkUpdate(10000, b) }
2
3❯ go test -benchtime=1m30s -bench=. ./pkg/life/life_test.go
4BenchmarkUpdate10000-4 19 5031633976 ns/op

So, about 5 seconds on my machine to update the whole grid.

Partitioning the Grid

Can we speed up the program by partitioning the grid and computing each section of the grid concurrently (in their own goroutines)?

Grid Partitions

We can achieve this by executing the outer update loop over a specific range of rows:

1// updatePartition applies the rules to a vertically-partitioned portion of the world.
2func (w *World) updatePartition(wg *sync.WaitGroup, yStart, yEnd int, nextGrid [][]bool) {
3 defer wg.Done()
4 for y := range w.Grid[yStart:yEnd] {
5 y = y + yStart
6 for x := range w.Grid[y] {
7 n := w.CountLiveNeighbours(x, y)
8 if w.Grid[y][x] {
9 // Any live cell with fewer than two or more than three live neighbours dies
10 if n < 2 || n > 3 {
11 nextGrid[y][x] = false
12 }
13 } else {
14 // Any dead cell with exactly three live neighbours becomes a live cell
15 if n == 3 {
16 nextGrid[y][x] = true
17 }
18 }
19 }
20 }
21}

Then we can call updatePartition concurrently for the desired number of partitions (note that the number of partitions must be a factor of the world height):

1// Update applies the rules to the world.
2func (w *World) Update(partitions int) {
3 // Create a deep copy of the world grid
4 nextGrid := make([][]bool, len(w.Grid))
5 for y := range w.Grid {
6 nextGrid[y] = make([]bool, len(w.Grid[y]))
7 copy(nextGrid[y], w.Grid[y])
8 }
9
10 // Apply Conway's rules
11 var wg sync.WaitGroup
12 for p := 0; p < partitions; p++ {
13 wg.Add(1)
14 go w.updatePartition(&wg, p*w.Height/partitions, (p+1)*w.Height/partitions, nextGrid)
15 }
16 wg.Wait()
17
18 // Update the world grid
19 w.Grid = nextGrid
20}

As an aside, you may notice that our goroutines may concurrently read from the same cell. This can occur at the boundaries between partitions where a neighbouring cell is found in the adjacent partition. This is in fact not a race condition, since no writes occur on w.Grid; they only occur on the deep copy, nextGrid. Note that if we wanted to remove the synchronisation (achieved with the WaitGroup) that is currently required with each call to Update - that is, to allow goroutines to race ahead to the next generation - then we would instead need to use channels to communicate the state of the partition boundaries:

Don’t communicate by sharing memory, share memory by communicating.

Here are the results of running the 10000x10000 benchmark with 2 partitions on my machine:

1❯ go test -benchtime=1m30s -bench=. ./pkg/life/life_test.go
2BenchmarkUpdate10000-4 26 4110649931 ns/op

That’s down from ~5s to ~4s, a 20% improvement. We can see further improvement with 4 partitions:

1❯ go test -benchtime=1m30s -bench=. ./pkg/life/life_test.go
2BenchmarkUpdate10000-4 33 3073360188 ns/op

However, increasing the number of partitions beyond this doesn’t improve the runtime of our program!

1❯ go test -benchtime=1m30s -bench=. ./pkg/life/life_test.go
2BenchmarkUpdate10000-4 33 3051534928 ns/op
3❯ go test -benchtime=1m30s -bench=. ./pkg/life/life_test.go
4BenchmarkUpdate10000-4 27 3930115474 ns/op

Concurrency is Not Parallelism

Why doesn’t adding more than 4 goroutines lead to a speed up? There’s a clue in the benchmark output:

1BenchmarkUpdate10000-4

That -4 postfix indicates the value of GOMAXPROCS during that run - that is, the number of OS threads available to our goroutines. My machine has 4 virtual cores, and by default (since Go 1.5) GOMAXPROCS matches this value.

When we split the grid into 4 partitions, the Go runtime was able to schedule the goroutines into separate OS threads, which are executed in parallel across each vCPU. With more partitions, execution of each goroutine continues to be concurrent, but not in fact parallel.

Concurrency vs. Parallelism

What if N > GOMAXPROCS?

You may have heard goroutines describes as “lightweight threads” before. One of the benefits of this lightweight footprint is supposed to be that you can spawn a lot of them - but if spawning more than GOMAXPROCS goroutines doesn’t speed up our program, what is the point? The answer lies with the Go scheduler and the type of work that our goroutines are performing.

Note that update algorithm for Conway’s GoL is busy. There are no I/O or otherwise blocking calls and the algorithm makes maximal use of CPU time. Imagine instead our goroutines were executing code that made blocking calls. Then, the scheduler could allow other goroutines to use the CPU while the caller is blocked.

Blocking Goroutine

So how do you determine whether to add concurrency to your program? It’s important to profile your code to understand where adding concurrency will yield benefits and where it will not.

If you want to play around with this yourself, you can check out the source code. If you’ve got comments, questions or feedback you can hit me up on Twitter ✌️

More articles from Christensen Codes

Flocking Gophers: The Boids Algorithm

Emergent flocking behaviour and predator evasion in Go.

September 6th, 2020 · 3 min read

Integration Testing in Go

How to test code works correctly together.

August 27th, 2020 · 3 min read
© 2020–2022 Christensen Codes
Link to $https://twitter.com/christensencodeLink to $https://github.com/mschristensenLink to $https://www.linkedin.com/in/mikescottchristensen