Opening the envelope
2025-01-22 electronics MSK 012
Quick: which module is most important in defining the sound of your modular synthesizer?
According to my highly scientific method of just making up answers without doing a survey, most beginners say it's the oscillator that's most important. It sort of makes sense that the module where the sound originates (in a typical subtractive patch) would be critical to the feel of that sound when it hits the listener's ears. And we see that pattern in forum threads posted by newbies looking for advice: they put a lot of care into choosing their first oscillator, less into other modules.
More experienced synthesizer users (again, in my imagination of what they might say if I had budget to survey them) point to the filter instead. They know that the oscillator's job is just to make an harmonic-rich waveform, there isn't really so much difference in that from one oscillator to the next, and even where there is a difference in the raw oscillator output, so much of the timbre of the sound is actually determined by the shaping done in the filter, that what you hear at the output is largely defined by the sound of the filter. It's easier to tell the difference between two patches with similar but distinct filters than when the difference is two similar but distinct oscillators and the same filter. Imaginary survey respondents with really good taste would simply name whichever module they bought from North Coast Synthesis as most important to the patch; but since my filter modules tend to be my most popular products anyway, it amounts to much the same thing.
In this article I'd like to talk about a different module that is also very important to a synthesizer's sound: the envelope generator.
Where envelopes come from
Suppose you've got a thing - a solid object of some sort, maybe a metal bar. It's not doing anything at the moment, just sort of sitting there, in a low-energy state. And then you do something to it - like maybe you hit it with a stick. What happens?
Because it's a solid object, the thing has some structure to it; it's made of atoms with forces between them tending to hold them in place. The atoms have inertia and that also holds them in place. You hit the metal bar with a stick, pushing some of the atoms out of place, but now there are forces pushing those atoms back toward their normal positions. The atoms start moving back into place.
But because they have inertia, now that they're in motion the atoms in the thing will tend to keep moving. So they not only move back to where they started, they keep moving past that point, against the original stick-hit in the opposite direction. And then the other forces will act in the other direction, pulling the thing back toward its normal state again. It goes back and forth, from one side to the other, maybe many times; but not forever, because all this motion is subject to friction. There's some energy lost on every cycle, and the cycles get smaller and smaller, until your thing is sitting quietly again.
The great thing about this story is that you can substitute pretty much whatever you want for the italicized words. It can be almost any kind of thing, and you can do something in almost any way, and the same general outline of events will unfold. It's a basic fact of physics that when an object is disturbed, it tends to vibrate for a while, losing energy, until it settles down again. And because this is such a common situation, our senses (especially hearing) are adapted to detect and analyse it. We know what vibrating objects sound like and we're good at hearing fine details in the sounds of vibrating objects.
Then it's natural to expect that musical instruments, in particular, would make sounds that are similar to the sounds of other physical objects. Acoustic instruments tend to actually be physical objects that make a noise when you disturb them somehow - drums that vibrate when you hit them with sticks, strings that vibrate when you pluck them or scrape a bow across them, flutes and horns containing air columns (okay, not actually solid, but) that vibrate when you blow air through or across them, and so on. Electronic instruments - like our synthesizer patches - may or may not actually work that way, but if not they're doing something else that produces a similar kind of sound.
Now let's look at a waveform. This is a few notes of George Winston on acoustic piano. (From "Night, part 2: Midnight" on the album December - a passage chosen because it's a few isolated notes without much background.)
Some of the notes are a lot louder than others, they overlap, and there are amplitude variations that come from reverberation in the piano. As sound waves follow different paths from the strings to the microphones, there can be constructive and destructive interference that makes the amplitude vary on the waveform plot. But you can still see the same basic pattern as in the earlier plot for the vibration of a generic "thing." Each note starts with a sharp peak when the hammer hits the string, and then it fades away as the vibrating string loses energy.
Zooming in on one note shows more detail. Here's the first note from the clip above, magnified in both time (horizontal) and amplitude (vertical).
In the zoomed-in view you can see that the note doesn't actually start instantaneously. There's a build-up from silence (actually, the tail of the preceding note, which is still reverberating a little) to the peak. There are several causes for that: the hammer takes a non-zero amount of time to actually hit the string; the multiple paths for sound waves from string to microphones mean that the start of the note is "heard" at multiple instants over a period of time; and there may also be some electronic all-pass filtering introduced in the recording process to smear out the peak for various sonic and technical reasons. The whole time from zero amplitude to peak is much less than the duration of other parts of the note that come later, but it is not quite zero.
Then after the peak, at first the amplitude dies out quickly. It follows from the mathematics of differential equations that the waveform you get by hitting something as complicated as a piano string is probably not really going to be just a plain damped sine wave such as I illustrated earlier; it's more likely to be well-approximated as several of those added together. On a tuned, harmonic instrument like a piano, the sine wave components will likely all be at integer multiples of a single frequency, or nearly integer multiples of a single frequency. (The details of that are better left to another article.) And it's quite typical that the higher-frequency components will die out faster. So in the waveform, we see an initial fast decline in amplitude as the higher-energy, higher-frequency vibrations die out, and then a much longer period in which the lower-frequency vibrations continue.
High-quality pianos are specifically designed to be capable of long sustain. The strings once set in motion will tend to keep vibrating a long time unless stopped. So the volume does not actually drop much after we pass the initial peak. But musicians also want to be able to bring notes to an end; so the piano mechanism also includes a system of felt dampers that press against the strings after the player lifts their finger from the key - a system which can be disabled, completely or selectively, by operating the pedals, to produce longer-sustain effects.
I'm not sure whether in this case it is really caused by the dampers or by the reverberation, but you can see something like the effect of the dampers toward the end of the note illustrated above: after sustaining without apparent loss for a while, the amplitude drops off, bringing the note to an end.
And it doesn't have to be a piano! Zooming in on the notes produced by almost any other musical instrument reveals a similar pattern, even if the parameters are different. The generic description of the stages of a single note fits reasonably well to a wide range of different instruments.
If we're going to build an electronic circuit that we want to sound like a musical instrument, then we need it to be able to do something like this too. In a synthesizer patch it's typical to handle spectrum and amplitude separately and then combine them. We typically use an oscillator and filter to create a spectrum at a basically fixed amplitude, on all the time, and then use a VCA (voltage controlled amplifier) to turn the signal on and off, and adjust its volume, giving the notes realistic shapes.
The circuit that gives the VCA its control voltage is the envelope generator, and a very common type is the ADSR envelope generator. Other control systems create a gate pulse for each note, turning on at the start of the note and off at the end. The ADSR generator shapes that into a control voltage with four stages denoted by the letters A, D, S, R:
- Attack (A) - amplitude goes from zero to maximum in a short time; the initial disturbance that sets the object into motion.
- Decay (D) - amplitude drops from maximum to some intermediate level, as the object loses energy from its more short-lived vibrational modes.
- Sustain (S) - amplitude remains at the fixed level, as long as the gate pulse remains active.
- Release (R) - amplitude drops down to zero again.
The ADSR model is something of an approximation. The amplitude envelopes of real acoustic instruments are not necessarily quite so simple. In my example from the recording, the amplitude goes up and down several times over the course of a single note as a result of different physical effects like reverberation. Exactly which physical effects cause instruments to have something roughly like an ADSR envelope, will be different for different instruments, and one simple model may not adequately capture all of those effects.
On the other hand, acoustic instruments usually cannot sustain indefinitely, nor maintain a perfectly flat amplitude during their sustain. Some instruments can't really sustain at all: they just have an attack and a release and the timing of those is what it is. So the ADSR model both fails to capture some effects we hear in acoustic instruments, and is capable of some things that are impossible for them. It's not a perfect match. But if you build a synthesizer that produces ADSR envelopes with the parameters adjustable - the timing of the A, D, and R stages and the "intermediate level" of the S stage - then it can both simulate a lot of different acoustic instruments, and make some worthwhile new sounds of its own.
Because many of the differences between instruments can be characterized as different parameters for the ADSR envelopes, you can change a patch to sound like different instruments just by changing the ADSR parameters. That's why I highlighted it at the start as a module critical to the sound of the synthesizer: if you're playing a simple subtractive patch, you can make it sound a little different by swapping out the oscillator, or make a bigger difference by swapping out the filter... but if you leave those things the same and just turn the knobs on the envelope generator, all of a sudden you're playing a different instrument sound.
ADSRs are among the most popular envelope generators for synthesizers in general and modular synthesizers in particular. Having at least one or two is pretty much a necessity for any rack intended to produce anything like conventional instrumental music. Even in a monosynth patch (designed to produce only one note at a time) it's often useful to have a second ADSR envelope for the filter cutoff, making it easy to simulate the "higher frequencies die off faster" effect which is typical of acoustic instruments because it's typical of physical objects in general. The MSK 012 Transistor ADSR is my own ADSR product, and I'll talk about how it works in more detail below. But first, let's look at some more advanced issues that often come up when people use ADSR generators in Eurorack.
How much speed do you need?
As I mentioned, the differences among instrument sounds often come down to the details of the envelope, and (because it happens at the highest-energy point in the sound, when all the frequencies are in play) the attack phase is a critical part of the envelope. We hear fine distinctions in the speed and shape of the attack phase, possibly more so than in similar parameters of other stages.
Many synth users think that what they most want from an envelope attack is for it to be "fast." There are regular threads on synth fora in which people ask for the "fastest" envelope. I'm not sure why that is, because it seems clear that almost any envelope generator is capable of much more speed than you'd ever really need. This is not an interesting measurement on which to compare envelopes.
You don't really want an extremely fast attack.
The fact is, extremely fast envelope attacks usually sound bad. Conversely, if you adjust your envelope generator so that it sounds good, then you won't be adjusting it to its fastest possible setting. Depending on the sound you're looking for, you might still want it at a pretty fast setting, but it's unlikely that you'll really want to use the very fastest setting. Then it doesn't much matter how fast the very fastest setting actually is, because you won't be using that anyway.
What I really find baffling is how when users try to articulate (pun intended) what the problem is that they hope to solve with a faster envelope, the problem they describe is usually an unwanted clicking sound in the audio output. That's exactly the problem caused by having the envelope attack set too fast! I've written before about the mathematical basis for how fast attacks create wideband clicking noises. The simplest way to remove or reduce the click is by slowing down the attack. So where does the idea come from, that in order to solve this, the envelope should be even faster?
Here's an example clip of a software synth with an extremely fast attack. In the image, the audio is in the upper track and the envelope shape shown in the lower; despite those being labelled left and right, the actual playable clip has audio in both channels.
Now here's another clip, identical except that the attack is slowed down by a factor of ten. This is about the fastest I was able to make it without getting an annoying clicking effect.
Even slowed down, this attack is still so fast that it's hard to discern in the unzoomed image, so here are two more images showing the first attack in each clip, with the time axis greatly expanded to show the difference between the two.
Another interesting difference between these two clips is that in the "slower attack" one, there's some audible stereo separation. I set up the software synth (which is a simple Csound orchestra) to randomize the phases of the harmonics on each note, and to do so separately in the left and right channels. On the "slower attack" clip, that results in perceptible stereo movement between notes. The phase randomization is happening just the same in the "faster attack" clip, but to me at least, it doesn't produce an audible effect in that clip. I think the wideband interference from the ultra-fast attack is screwing up my ears' phase perception so that the stereo effect is lost in the first clip.
But even if synth users don't really want fast envelopes, they clearly want something. They may simply be confused about what to call the thing they want. I wish I knew what that thing is, but I think there's a clue in the fact that people asking for "fast" envelopes usually don't only use that word. They also say they want "snappy" envelopes, saying "snappy" as if it were the same thing as "fast." I haven't been able to get a good understanding of what "snappy" is supposed to mean either - not even finding examples of music containing snappy envelopes, that I might be able to pick apart the way I picked apart the piano notes above. But it's halfway reasonable to say, okay, there is some kind of phenomenon related to envelopes, that listeners perceive as the envelopes being "fast" or "snappy," and we'd like to isolate that phenomenon and build circuits that can do it, even though it's not really so simple as just the envelope going from zero to maximum in a short physical time.
I think it's reasonable to guess that maybe part of what people hear as the speed of the attack is really the shape of the attack, and specifically, the time spent near the top. If the attack curve is shaped so that it's concave downward, that is, the slope is steeper near the start and levels off toward the end, then it means it spends more time near the top. Maybe if the envelope spends more time near the top of the attack, listeners will hear it as more speed, even if the time it takes to get there is actually the same. Maybe that's what people mean who talk about an envelope being "snappy": it snaps into position at the top, and then stays there for the necessary fraction of a second that the ears perceive it as having snapped into that position.
There are different ways to achieve increased time at and near the top, including changing the mathematical function that generates the attack shape; having the envelope "hold" for a while after the attack, before starting the decay (creating what is sometimes called an AHDSR envelope); and changing the way the VCA responds to the envelope generator voltage, so that a range of voltages in the later part of the attack are all compressed into effectively maximum amplitude.
Here are a couple more clips from the software synth. For these two, I've used the same shape and speed of the attack curve - both are simulating a charged capacitor with parameters such that it reaches half the theoretical peak voltage in 10ms - but in the first clip, I'm allowing the capacitor to charge for 30ms before switching to the decay phase, and in the second, I'm allowing it to charge for 90ms. The result is a flat top on the envelope in the second clip. All other parameters are the same between the two. I'm not sure whether one of these really sounds faster, or "snappier"; but see what you think.
Analog and digital envelope functions
As I said in the press release when I announced the MSK 012, part of the fun of designing ADSR envelope generators is that - unlike such modules as mixers and oscillators, which have well-understood "standard" circuits everybody uses - there's a lot of scope for creativity in ADSRs. One important distinction is whether to use a microcontroller, or try to do it with analog circuitry.
Envelope generators take gates or triggers as input, and those are digital signals. Envelopes also, typically, have discrete states that they go through (the A, D, S, and R of the name). They need to make logic decisions about transitioning betwen those states. So in an important sense, envelope generators are always at least partially digital. Nobody should be trying to sell you an "analog" envelope on the claim that it's not "digital." But we can still draw a line between using a microcontroller to calculate the desired output voltages over time, and using a more traditionally analog approach, which will normally come down to charging and discharging a capacitor.
In the MSK 012, I used some of the simplest digital logic circuits possible - RTL inverters and diode logic - to turn on and off the current sources and sinks that charge and discharge a capacitor. Then the capacitor voltage is buffered to generate the output. The result is that all the curves look like a capacitor charging or discharging: concave downward when it's going up, and concave upward when it's going down. This kind of curve is sometimes called "exponential" but that's a tricky word because in case of curves going up, the "exponential" charging of a capacitor is more or less opposite to the kind of curve described by the phrase "exponential growth" in other contexts. I try to think of it as "like a capacitor" instead of getting into an argument over what, exactly, is the difference between exponential and logarithmic increase.
You may note that in that screenshot, although the attack curve is concave downward, it's not very much so; it doesn't flatten out before hitting the peak and going into the decay phase. This is something of a compromise between a linear attack and a more heavily concave one.
Charging and discharging a capacitor, which might be called the analog approach, naturally tends toward producing the kinds of shapes shown in the MSK 012 screenshot: concave downward when going up, concave upward when going down. That's just what a capacitor charged through a resistance does, and although analog circuits can be designed to do other things, since the charging-capacitor curves tend to sound good they are usually what analog-focused designers end up implementing. With a digital microcontroller, on the other hand, it's possible to implement almost any arbitrarily chosen shape to the curves.
But what's the voltage range?
Something else you can see in that screenshot, if you read the small print, is a point that some Eurorack users find very confusing: the peak voltage is about +8V, according to the Doepfer standard. But most modules that take a control voltage, including for instance my own MSK 015 Quad VCA, which is certainly designed to work with my envelope generator, use a control voltage range of 0..5V. Isn't this a big problem? Why is Eurorack so inconsistent on this point? Why do we need a whole lot of voltage-conversion modules to convert between +5V, +8V, and +10V voltage ranges, and given that such conversion modules are obviously necessary, why are so few of them on the market?
The fact is:
Voltage range is not really a thing.
I get a lot of pushback every time I say that, and of course I'm stating it in excessively broad terms for effect, but it's not going to stop being true. Often the pushback comes in the form of someone acknowledging an example I gave of why voltage range isn't a thing, but then claiming that it only applies to that one example and voltage range is still a thing everywhere else - for instance, after reading my article all about signal levels, saying, yes, okay, that's true for audio signals, but voltage range is obviously a big thing for control voltages, right? No.
The "signal levels" article gives some insight on what's really going on here, which is also clear from looking at the waveforms earlier in this article: audio signals are spiky. A recording of "program material" (i.e., music) is going to have a certain amount of signal almost all the time, at a level which varies somewhat but maybe not a lot, and a typical voltage range for that, but then it's also going to have brief spikes where it goes significantly out of the typical range. The attacks on the note envelopes, for instance. During those spikes the voltage goes significantly higher, but only briefly.
That's why the pros don't usually measure analog signal levels in terms of the absolute maximum voltages achieved at the peaks, but rather in terms of the typical deviation from zero - the RMS voltage, which is a more precise way of describing that "typical deviation." It is predictable that the program material will sometimes spike out of the typical range, and it's fairly consistent how often that happens and by how much, and measurable things like RMS voltage and dynamic range allow describing those effects in detail where necessary. Minimum and maximum voltage are not the best ways to do it.
Once you start to get voltages out of the typical range, there is a steadily increasing likelihood of clipping and distortion. If necessary, and only if necessary, you can start creating additional specifications for how much headroom you need, or how often a signal goes out of the typical range. Experienced engineers watching a VU meter will have a sense of how much and for how long it's okay for the signal to go into the red zone. The optimal amount of that is not going to be zero, but neither will it be unlimited. There is certainly such a thing as a signal being too hot or not hot enough, and we can measure that in terms of volts, but the point is there will not be a hard-edged "voltage range" that is useful in setting up an analog signal path. Instead, you do it by shooting for the right RMS level.
This is all talking about analog signals, signals that really are voltages on wires. In the digital realm, where signals are sequences of numbers in a specified range, the boundaries of that number range may tend to be sharper, with clipping happening suddenly at a fixed and predictable point. Avoiding clipping is sometimes a big thing in the all-digital DAW environment, and maybe people coming from there are especially inclined to look for it to be a big thing in analog. But even with digital modules, Eurorack uses analog interconnections between them.
And if we're building an envelope generator that is designed to help create typically spiky program material, and a VCA that applies the instructions from the envelope generator within a patch, the communication between the envelope and VCA is going to reflect that same kind of spiky profile. It's reasonable to say we expect the control voltage for the "normal" signal - that is to say, the sustain phase - to usually be 0..5V. And then the peak of the envelope? It's going to spike higher. Like maybe 8V.
And what if your VCA is only really designed to be linear over 0..5V, and it starts to clip on a control voltage input of 8V? What if you can't really get much more gain with a higher control voltage than you'd get with 5V? Then most likely, the clipping will result in flat-topping of the ADSR envelope, as discussed in the section above on "snappiness" and AHDSR. Which may well be a desirable effect. Similarly, if clipping happens on the audio input instead of the envelope input, but just at the peak of the attack, then what you get is a burst of extra harmonics lasting only for the transient period at the start of the note, dying out as the sound moves into the sustain phase. Which is typical of real instrument sounds.
I think that a big part of the reason "8V envelopes, 5V VCAs" has persisted so long in Eurorack is that at least with the analog circuitry for which the format was originally designed, it's likely to create desirable sonic effects. And it's not really as stark a disagreement as people may assume, because of the difference between what the numbers represent: the 8V "voltage range" of the envelope is talking about the tip of the peak, whereas the 5V "voltage range" of the VCA is talking about the typical or normal level. It is perfectly okay, and it is how analog usually works, for the tip of the peak and the usual level to be at different voltages. There is no conflict to resolve here. It is not a big problem the community needs to fix by settling on a "standard"; it is a non-problem.
How the MSK 012 works
Let's look at some details of the circuit in the MSK 012. As I said, an envelope generator is always at least partially digital because it manipulates digital signals; but my design is about as much an "analog" approach as it can be. The basic principle is to shape the incoming gate pulse into an ADSR pattern by charging and discharging a capacitor to the appropriate voltages. As the name - "Transistor ADSR" - implies, the MSK 012's digital circuitry is based on discrete transistors.
We nowadays tend to think of digital signals as pure voltage levels. Maybe 0V is logic zero and +3.3V is logic one; that's a fairly common convention. In earlier times, the high voltage for logic one often tended to be higher; 5V was common, and early CMOS chips often used +15V. There were different logic "families," such as 7400 TTL and later 74LS00 TTL, each with their own speed and power-consumption characteristics and their own definitions of the voltages and impedances for logic signals.
In the pre-IC days when logic had to be built from discrete components, RTL was a common family. That is "Resistor-Transistor Logic," and I discussed it in some detail in an earlier article on logic before ICs.
A logic "one" in RTL is a connection to the positive power supply through a relatively high impedance, like maybe a few tens of kiloohms. A logic "zero" is a connection to 0V through a fairly low impedance. Specifying the impedance like this is important because the design of RTL gates is such as to basically short outputs to ground; if they had low-impedance (like modern CMOS outputs) in the high state, doing that would be a problem.
The basic logic gate in RTL is an inverter:
The base of the transistor is the input and the collector is the output. Connect that to a positive voltage through a decent-sized resistor, and the transistor will turn on, connecting its collector to ground with low impedance. Some current also flows harmlessly through R1 and the transistor. The overall effect is that a "one" on the input creates a "zero" on the output. On the other hand, if the inpurt (base) is brought near 0V with low impedance, then the transistor will pass basically no current, exposing the power supply through the resistor to the output connection. That's the other side of the inverter function: a logic zero on the input gives a logic one on the output. As discussed in the "logic before" article, it's easy to vary this circuit by adding other transistors and resistors to create other kinds of logic gates.
If you put two inverters nose-to-tail, you get a latch or flip-flop circuit (archaically: a "bistable multivibrator"). The first inverter's output can be high, and that's the input for the second inverter so its output is low, which is the input for the first reinforcing its output as high, and the circuit will remain stably in that state. Or, it can also be stable in the opposite state with the first inverter low and the second one high. Make some other modifications to allow external signals to force it to switch between these two states, and you have a simple digital memory.
I put such a latch at the input of the MSK 012 Transistor ADSR to condition the input gate signal into a well-behaved logic level for the rest of the module. Eurorack users are likely to feed in all kinds of weird things, such as signals that change only slowly instead of sharply rising at the start of a pulse, or signals that go negative, or to weird voltages. I wanted the envelope to trigger predictably and reliably, and this circuit is meant to ensure that.
The core of the input circuit is just a latch made of two RTL inverters. The other components are elaborations on the basic latch circuit.
First, R1, the 1MΩ resistor, keeps the feedback from Q2 back to Q1 relatively weak. If there is nothing else going on, the circuit will keep its state; but a strong enough signal on the input can override the feedback from Q2 to Q1, forcing Q1 to follow the input state. R8 sets the relative strength of the input signal: being only a tenth the value of R1, it takes much less voltage swing on the module input to change the state of the latch, compared to the voltage swing on the output labelled "B."
This overall circuit topology, of a two-inverter latch with some but not conclusive feedback making it tend to hold its state, is called a "Schmitt trigger" (Otto H. Schmitt, 1913-1998). It's something like a comparator but with two thresholds: if the input and is low and rises past the higher threshold, then the output goes high, but after the output goes high, the input has to retrace its steps and fall back below the lower threshold before the output will go low again. That way, small and slow fluctuations in the input will not cause a lot of switching back and forth. Many thermostats display the same effect, which is generally known as "hysteresis." The resistor R12, in combination with R1 and R8, helps set exactly where the two input thresholds will be: approximately at 2V to turn on and 1V to turn off. Well-behaved Eurorack gate signals go from 0V to 5V and spend almost no time in between, but these thresholds help define more precisely what will happen if the input is not a well-behaved gate signal.
The diode D6 is for protection in case someone plugs in a significant negative voltage; it prevents any significant voltage appearing across the base-emitter junction of Q1. Such voltages can be damaging to transistors, especially high-gain transistors like these 2N5088s, which tend to have low breakdown voltages. I saw a recent forum discussion in which someone pointed out that a well-known designer of open hardware routinely left such protections out of her gate inputs, apparently on the theory that the input resistor like R8 will limit the current enough to prevent catastrophic damage anyway, and any minor damage that would still happen would only affect the transistor in ways that won't show in this kind of switching application. It would only be a problem if, for instance, someone tried to desolder that transistor and reuse it in a different kind of circuit. That's fine, but under the circumstances I wanted to err on the safe side.
At the output side, another diode D3 along with the resistor and capacitor R9 and C1 form a pulse generator. When the main output B goes high, it does so very quickly because it's being reinforced by the high gain around the two-transistor loop. C1 and R9 can be seen as a passive high-pass filter that allow through just the sharp spike, not the DC level of the output; and then D3 turns that into a single-sided spike, blocking the negative spike when B goes low again. The time constant of the filter is about 18 microseconds and the width of the pulse will be about that order of magnitude. This pulse is used to set the attack flip-flow, discussed later.
A very similar Schmitt trigger circuit monitors the module's output, for the purpose of recognizing the end of the attack.
Here the RTL inverters have been flipped upside-down, and use PNP transistors instead of the usual NPNs. So the logic levels here are defined to be "positive power supply with low impedance" or "zero volts, with relatively high impedance." Doing that made it easier to get the desired voltage thresholds on the input. You may note that the only output taken from this circuit is through a pulse generator identical to the last one; the output Schmitt trigger's only important function is to recognize the single event of the output hitting its peak at slightly above +8V.
There's a third latch in the MSK 012, again made of two RTL inverters nose to tail, to keep track of the basically one bit of information that the envelope generator needs to remember beyond the current state of its input: whether the attack phase is currently in progress.
There are basically three states for the envelope, and these circuits above handle remembering them and transitioning between them. First, when the envelope is "at rest," not triggered yet, the B (cleaned-up input) and Q (attack flip-flop) signals are both 0. When the input goes high, B goes high, which generates a pulse that sets Q high as well, so they are both 1. That's the second state.
As long as Q is high, the generator is in the attack phase and the voltage will rise toward the positive power supply. When it hits the peak a little above +8V, the output Schmitt trigger generates a pulse (labelled "A+" on the schematic), which causes Q to drop. So the generator goes into the third state: B still at 1, but Q at 0. This covers the decay and sustain phases. There is no precise end to the decay: in the decay/sustain phase, the output voltage just drops toward the sustain voltage and then stays there. Similarly, when the input goes low and B goes low (returning to the starting state), the output voltage drops toward zero and the release has no very definite end, it just trails off.
- Release/untriggered: B=0, Q=0, output at or falling toward 0V.
- Attack: B=1, Q=1, output rising toward peak around 8V.
- Decay/Sustain: B=1, Q=0, output falling toward sustain voltage.
You may note that there's an undefined state possible here: what if B drops back to zero while Q is active (that is, during the attack)? In fact that is handled by the next sub-circuit, in which the effect of B low overrides any effect of Q. If B drops during the attack, then Q may remain high (creating the extra, undefined state B=0, Q=1), but the effect on the output voltage will be the same as B=0, Q=1; and at such time as B changes again, it would have the effect of setting Q to 1 anyway and then we're back on track.
With the logic circuits putting well-behaved digital levels on B and Q, it's necessary to charge and discharge the capacitor through the appropriate resistances (which will be variable, via the front-panel knobs) to make the output voltage track the proper envelope shape, according to the state described above. That's done by a section I call the "ADSR driver," which looks complicated but is basically just a bunch of diode switches that connect and disconnect resistors from the line labelled "C," which goes to the envelope capacitor.
Let's go through the states of B and Q. With B and Q both low (release/untriggered), the cathode of D1 is basically shorted to ground. Current through R2, R3, and R4 passes through that diode. The anode of D1 is not much above ground, so D2, D4, and D5 end up reverse-biased, disconnecting this part of the circuit from the higher positive voltages expected to be seen elsewhere. The emitter of Q3 is expected to be driven relatively high either by the sustain voltage coming through D7, or the "Q" signat through D11 and D12 if we should happen to be in the undefined state, and either way, D8 will be reverse-biased, taking Q3 and everything to the left of it out of the picture too. All that is left is that the capacitor can discharge through D13 and R15 - it will complete the release if there is one in progress, and then remain in the untriggered state.
A gate pulse comes in. B and Q both go high. With B high, D13 becomes reverse-biased, so the capacitor can no longer drain through R15. With Q high, D11 and D12 definitely hold the base and therefore emitter of Q3 high, keeping it out of the picture. But now that B is high, D1 is reverse-biased also, so the current through R2, R3, and R4 is no longer diverted into ground. The capacitor can charge through these components and D2; and it will probably charge quite fast, heading toward the positive rail.
When the capacitor voltage hits about +8V (technically, a voltage that will lead to an output of +8V, because the capacitor is somewhat offset from the actual output voltage) the output Schmitt trigger activates, sending a pulse that makes Q go low. Now we are in the decay/sustain phase.
With Q low, D11 and D12 become reverse-biased and the voltage on the base of Q3 will be determined by the output of the sustain level potentiometer, R10. Q3 is set up as an "emitter follower"; it draws as much or as little current as necessary, to bring its emitter to the sustain-level voltage (apart from an offset compensated by D7). The capacitor, which remember is at quite a high voltage at the start of the decay phase, is allowed to discharge through R11 and D8. Meanwhile, the current through the attack circuit passes through D4 and D5 into Q3 as well. The capacitor voltage falls. But once it hits the sustain level, Q3 can no longer lower it further, and there won't be enough voltage across D4 and D5 to prevent the attack current from coming into play. The voltage falls only as far as the sustain level, and then no further.
When the gate pulse goes low, the logic circuits return B and Q both to zero, and the ADSR driver goes back to "release/untriggered," draining the capacitor through R15 at the release rate.
The use of double diodes (series pairs) at some points in this circuit may seem mysterious. Without going through each case in detail, what is basically going on here is that a double diode has twice the forward voltage drop of a single diode. Often the voltages at one point in the circuit are generally a diode voltage drop higher or lower than at some other point because there's a single diode or a transistor emitter-base junction in between, and using a second diode in another place becomes necessary to compensate for the offset and keep everything lined up.
The voltage on the envelope capacitor is quite sensitive to any current flowing in or out, so we can't afford to just expose that capacitor to the outside world through an output jack. Even a standard Eurorack 100kΩ input impedance would disturb the shape a lot, and of course, users like to plug in nonstandard things. So there's a separate "output buffer" stage which provides a reasonably harmless high impedance to the capacitor while echoing its voltage (subject to a diode-drop offset) to the output jack. This section of the schematic also shows the envelope capacitor itself, which is really three with a switch to bring different combinations of them into the circuit and set the overall range of the timing; and the LED for visual indication of the envelope status.
You may amuse yourself by figuring out what protects the LED from reverse voltage if somebody plugs -12V into the output. The diode D15 protects against the opposite case, where someone plugs +12V into the output and effectively tries to charge the envelope capacitor externally. In that case it's important that it shouldn't charge through the reverse breakdown of Q4, especially given that (unlike on the module input) there is only 1kΩ to limit the current.
In summary, that's how my basically-analog ADSR generator works. A microcontroller-based one would be a lot different in both its features and its design principles.
◀ PREV AoC 2024 in ECLiPSe-CLP