New audio subsystem

Started by iWasAdam, July 16, 2018, 12:16:32

Previous topic - Next topic

iWasAdam

ok. there are no pictures or anything to show.  8)
I'm working on a new audio subsystem and thought I'd share a few thoughts and open it up for anyone to comment with thoughts, suggestions etc.

if we take a general look at audio. it is usually the following:
1. load a sound
2. play the sound
3. if we are lucky the sound may vary in volume pitch and pan
4. caveat: pan can (usually) only apply to mono sounds
5. sounds can be looped - but only the whole sound - not loop points, etc

So here are my thoughts:
1. have a system where the audio is exactly the same across platforms (Monkey2 has serious issues with MacOS playback)
2. sounds (mono and stereo) can be properly panned
3. loops can be programmed
4. different playback options - reverse for example
5. FX busses such as filter, echo, delay, reverb
6. different sound generation systems: granular, wavetable, vector
7. envelope and other parameters (LFOs)
8. possible sound design from blocks. drag/drop to create new sound systems with user controlled routings?

That's my list. any thoughts?

col

#1
Hiya,

Just a thought...

When I've played with sound effects and music samples I've always thought that having some kind of callback mechanism for when a sample, or more specifically a section or timing of a beat, is played could be useful.

For eg, instead of code triggering to play a sample, the reverse would happen, a long sample is played with say a bass beat and when the beat is hit a callback can be triggered allowing 'that beat moment' to be known in code.
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Derron

Callbacks could be added everythere - but maybe this could also be done in the "TSound"-object-extensions then.

Talking about stuff like auto-volume-adjustment for fake-positional-audio (think of a "GetVolume()"-callback feeding a volume depending on the soundobject-parent's position on the screen towards the player's screen position).
Similar stuff as for volume could be written about "panning" - so panning position/settings could become dependend on certain callback results/adjutments.


Regarding sound routes: as soon as you have a "player" and some kind of "API" for the sound objects (requesting data buffers, informing about play, pause, stop ...) you only need to provide basic capabilities - avoids workload on your shoulders.


bye
Ron

iWasAdam

mmm, currently I'm working with just getting the base audio working.
I've got a basic OpenAL, but there are some timing issues, so I'm going to give a SDL_Audio one a try as well.

The essence of both is the same:
Stereo RingBuffer being filled with sound data

I like the concept of the audio call 'you' back and also auto position stuff...

iWasAdam

I was thinking of something along these lines:


where the red lines are left, light green are right.

In this example the generator would be a mono sound.
The left channel then goes into a ToStereo where the right channel is just a copy of the left channel
And finally output

iWasAdam

I think I'm sort of getting my head around this one.

In essence you will select a source, and then add modifiers (they will appear left to right). These will be processed in order until they come to the output.

The order of the modifiers will 'sculpt' the output sound differently, so having 'grunge' before 'delay, would give different effects.

Each source/modifier may also have outside inputs. You can feed these inputs from outside stuff like envelope, lfo and fader controls.


This sort of means that each voice would be a programmable playback device (a sort of Uber synth).


I have already extended MX2 to support a much higher range of output frequencies that map to 10 octaves: From really low (almost sub sonic) to FAAAST.

First modifiers are:
BitCrush - crushes the sound so it can be more 8bit sounding
Grunge - sort of messes up the sound a bit like a fuzzbox
Distort - add a nice distortion - great when used with grunge
StereoDelay - takes a stereo source and delays one channel, or takes a mono source and creates a new stereo channel with delay.


I'm not sure how many to allow in a 'chain' probably 3 or 4 would be a good amount?



Derron

Why should you need to "allow" an amount of elements in a chain?

Type TSoundProcessor
Field input:TSoundProcessor
Field output:TSoundProcessor
End Type

...

Chain as much as you want. The final output element has an input-socket of type TSoundProcessor too - and all "SoundElements" (data, waves, ...) have an "output" linking to "TSoundProcessor". Then you create a stub-TSoundProcessor which just acts as connector between "SoundElement" and "SoundOutput".


bye
Ron

iWasAdam

yep that's one way to do it :)


iWasAdam

#8
slowly moving stuff around and getting things operational.

A chain can now be of any length, but I'm sticking to max of 6 at the moment.

One major change with making things modular is the sound generation itself. this was going to be sort of:
Generator then the FX chain.

I suddenly thought that the sound sources are just FX as well, so now they could appear anywhere (or not at all) in the chain. it also mean that multiple generators can be used. I'll need to test this... hmmmm

Yep. it works!!! Just got a stereo sample playing forward AND backward at the same time (this is not 2 separate voices just a single voice with 2 generators in the chain)!

Derron

Yes I originally wrote too, that the "TSoundProcessor/TSoundEffect" could be used as input too. So you have something which somehow generates/manipulates/plays "wave data" - and this gets plugged into an "output". The Output is then something which adds compatibility with existing playback-providers (eg. the sound engine in Monkey 2).

The basic idea is then:
- have something which can playback a buffer (-> output)
- have something which somehow fills a buffer (-> the sound processors / effects)

So it is up to the "processors" to emit events/call callbacks if they start all over, generated something, got hooked to something ... they might even add callbacks to the predecessor sound effects (to get informed if a "parent" has done something of interest).

The output then emits events/calls callbacks too (buffer got refilled, processor/effect as input got attached, ...).

As inheritance can lead to cyclic dependency you either need to have an "interface" (API...) or use some kind of "TSoundOutputBase" which defines all the callbacks already. This way each TSoundProcessor/Effect could import the TSoundOutputBase class without needing to know about the actual implementation (as TSoundOutput needs to know about TSoundProcessor this would elsewise lead to cyclic dependency).

At the end of the day the SoundOutput (or its extending ancestors) would be able to know/react to SoundEffect stuff - while SoundEffect stuff can react to SoundOutput things. It is not really needed but maybe one can come up with some useful examples. Another way is the classical hierarchic-information:
output informs attached sound effect ("input") ...
attached sound effect informs attached sound effect ("input socket")
... do this until no further input socket is filled (mostly the sound generator / file loader/streamer)

Depends on how you want to layout object connections and knowledge about each other.



BUT: what does all of this have to do with an "new audio subsystem" ? You are talking about a "TSound"-object generation. Means it _could_ just be build on top of existing code and extend the "Sound container" while adding some clever "need to refill buffer"-stuff. The "callbacks" we discussed a bit here are what is needed to get added to existing sound systems so extending TSound-objects could handle stuff properly (eg. streaming data of a music-file).


bye
Ron

iWasAdam

#10
It's NOT TSound, it's built from the ground up!

Base design is:
Voice is a single stereo playback unit. it contains all the code and logic to create (from either input sample or generated sin, etc), and all the FX code. a generator is just treated as an fx. so there are no generators, just fx.

Synth deals with the buffers, and voice mixing (currently it is just one voice), actual audio output, etc

so in use it is:
(in new)
mySynth = new Synth
update = 120 times a second

load sound0 from disk, or create a waveform (sound0) , whatever

(in update120)
mySynth.UpdateAudio()


(when you want the sound)
mySynth.Pitch( _pitch )
mySynth.NoteOn()

none of this will really do anything as you would need to have a basic voice architecture defined (how the voice is created from a chain) - I should add a basic default chain here... PlaySample > Pan


The key here, is that it is actually operational and solves the main issue with MacOS and Monkey2 borked audio system

iWasAdam

thats the basic stuff.

the more interesting stuff will come with LFO and Envelopes...

Derron

so it is not about the "audio playback" (which what I as "normal"-non-audio-enthusiast-developer understand as "audio (sub)system") but about the "audio object" (waveform or "data") and however it can get created.

Think I misunderstood what you are doing - but, hmm "it is NOT TAudio" - so it is ... what are you rewriting? How "audio content" is created (streamed files, dynamical created tunes,... ...)? How "audio content" is played (position dependend volume, panning, loops which inform the "creator" too)? How "audio content" is output (OpenAL, Alsa, PulseAusio, LibJack, DX, ...)?


bye
Ron

iWasAdam

ok.
Sound is loaded as a mono or stereo 16 bit waveform.

say. drumclap.wav

usually the sound is packaged and sent to a channel. the channel then plays the sound through the connected device (usually openAL).
Again this means the device creates an internal sound and plays it.

But you can do it other ways:
a ring buffer
in this case you create an openAL sound that continuously loops. You then give it a buffer and feed the beffer with sound data that you are in control of.

if you need pitch control you will need to figure out how to do it and feed it to the buffer.
if you need panning, then you will have to figure out how to do it and feed that to the buffer

NOTE
BlitzMax has a very nice sound system that did exactly the above with a ring buffer being fed correct data. it was stable and could be extended but could get a bit buggy at times. But you could pan stereo samples, etc.

OpenAL wont let you pan stereo samples, only mono samples. and there is something borked in monkey2 audio that causes mono samples to loose about the first 512 bytes of data in any sound being played.


So the new Audio subsystem is an openAL ring buffer being fed custom data. so to get ANY sound output you need to write the correct data to the buffer and make sure the buffer is kept fed with data other wise you will get 'bad' audio cracks, pops, dropouts etc.
If you need to alter the volume, pitch, pan, etc. You need to actually write the code itself to feed the buffer

Here's the code for dealing with a mono sample:
local _temp:double = double(sound.GetSampleMono16( sndPos Mod _sndLength )) / 65536

you then need to write _temp to the buffer:
buffer[ position ] += _sndL * volume

and finally output the buffer to the ringbuffer

And... You've got to do constantly - you cant have a null buffer or you will get a crash!
And buffers must have enough data to be happy and as little data as possible to prevent latency (latency is when you call a sound and there is a slight delay before it appears)




iWasAdam

Here's a demo of MacOS OpenAL BadSound:
https://soundcloud.com/mavryck-james/bad-sound

This is a mono clap. it sound fine, but the beginning has been lost....!

Here is the correct audio:
https://soundcloud.com/mavryck-james/good-sound

You can 'hear' the second sound has a more punchy beginning (it plays the entire sample).
The first one is softer as the beginning is missing!

one thing to note is:
- the first one (the soft bad one) is the default monkey2 sound system on MacOS (windows and Linux dont have this issue)
- the second one is the new audio subsystem!