I think you’re not quite there yet, I may be wrong, but think about what’s happening step by step and remember it’s happening really quickly:
First without any envelope signal,
- Envelope signal is 0
- Incoming audio passes through zero-crossing
- S&H triggers
- Envelope output is sampled and ‘Hold’ at 0
Then with envelope signal at 1 i.e. not zero lets say the attack phase goes from 0 - 10 just to make it easy to understand,
-
Envelope signal is 1
-
Incoming audio passes through zero-crossing
-
S&H triggers
-
Envelope output is sampled and ‘Hold’ at 1
-
Envelope signal is 2
-
Incoming audio passes through zero-crossing
-
S&H triggers
-
Envelope output is sampled and ‘Hold’ at 2
Etc.
There will never be any choppiness because the value is held at the last known value!
Of course there are many many more sample and holds, so the held values follow the incoming envelope signal at a high enough resolution that you will never be able to hear anything other than the envelope working as it should.
Does that make it clearer?
As an aside, you actually only need the S&H to work right at the beginning of the envelope so that the moment the envelope opens the VCA and the volume goes from 0 to something it’s not at a point where the signal has to jump. After this initial transition it’s not necessary, it’s just easier to leave it running than try and work out a way to only activate it at the beginning of an envelope - ugh!
Also remember that if you have a slow attack this is much less of a problem because the first sound that is allowed through is quiet anyway. This technique is only necessary when you are using a very short attack and you want to avoid opening the envelope while the audio signal is high which is why clicks happen. All the does is make sure the envelope opens at the same time as the audio transitions a zero crossing.