|
|
@ -147,8 +147,7 @@ void main() {
|
|
|
|
|
|
|
|
|
|
|
|
There's a correction factor here (multiplying the value by 255), but that's because
|
|
|
|
There's a correction factor here (multiplying the value by 255), but that's because
|
|
|
|
my board data had only 0s and 1s in the bytes for indicating an alive or dead
|
|
|
|
my board data had only 0s and 1s in the bytes for indicating an alive or dead
|
|
|
|
cell. (And on the GPU this kind of data transformation is *very* fast, which is
|
|
|
|
cell, and the shader expects that texture value to have a range between 0 and 255 (the range of a byte).
|
|
|
|
why I didn't first correct that in Javascript).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The shader also takes two `uniform` parameters for what colors to draw with, so
|
|
|
|
The shader also takes two `uniform` parameters for what colors to draw with, so
|
|
|
|
I will show you where those get set:
|
|
|
|
I will show you where those get set:
|
|
|
@ -187,7 +186,7 @@ So I don't know that much about how Webassembly execution is implemented into
|
|
|
|
browsers, but I know enough about actual CPU architectures to make some reasonable
|
|
|
|
browsers, but I know enough about actual CPU architectures to make some reasonable
|
|
|
|
inferences.
|
|
|
|
inferences.
|
|
|
|
|
|
|
|
|
|
|
|
With that in mind, as I started thinking about more significant algorithm changes
|
|
|
|
As I started thinking about more significant algorithm changes
|
|
|
|
I could make, I suspected that the largest contributor to time taken in my old
|
|
|
|
I could make, I suspected that the largest contributor to time taken in my old
|
|
|
|
algorithm was probably from it taking longer to access the Webassembly linear memory
|
|
|
|
algorithm was probably from it taking longer to access the Webassembly linear memory
|
|
|
|
than local or global variables. At an implementation level this could be because
|
|
|
|
than local or global variables. At an implementation level this could be because
|
|
|
@ -195,7 +194,7 @@ of how the wasm memory relates to the CPU cache, but I didn't really care about
|
|
|
|
\- I was just curious to see if reducing the number of memory operations my algorithm
|
|
|
|
\- I was just curious to see if reducing the number of memory operations my algorithm
|
|
|
|
took could provide me more of a speed-up.
|
|
|
|
took could provide me more of a speed-up.
|
|
|
|
|
|
|
|
|
|
|
|
I knew I was already pretty inefficient in this areay - my previous algorithm had made ***10 memory calls per cell***
|
|
|
|
I knew I was already pretty inefficient in this area - my previous algorithm had made ***10 memory calls per cell***
|
|
|
|
(8 to check neighbor states, 1 to check the current cell's state, and 1 to store
|
|
|
|
(8 to check neighbor states, 1 to check the current cell's state, and 1 to store
|
|
|
|
the updated cell's state), so I suspected I could get that much lower.
|
|
|
|
the updated cell's state), so I suspected I could get that much lower.
|
|
|
|
|
|
|
|
|
|
|
|