edom.web.id

Haskell sound synthesis developer documentation

Architecture

The principal idea that glues the library together is treating a value of type G m a as an infinite effectful stream of as, where m is usually IO; An inhabitant of that type is a generator of samples of type a. calling such generator computes the next sample.

After treating IO a as stream, the rest of the library naturally follows.

Most of the code is about the G IO (G IO a) type. We think of an (G IO a -> G IO b) -> G IO b as a resource-allocating sink.

This is what I get if I start from abstraction:

data Stream a = forall s. MkStream (s -> s) (s -> a)

This is what I get if I start from speed:

newtype G m a = MkG { _unG :: m a }

There is also this representation for even more speed:

newtype Render a = MkRender { _unRender :: Int -> Ptr a -> IO () }

We can also use CPS:

newtype CPS a = (a -> IO ()) -> IO ()

alsa-seq is BSD3 http://hackage.haskell.org/package/alsa-seq http://hackage.haskell.org/package/midisurface

We don’t need to read MIDI because there is alsa-seq. We don’t need to write MIDI because there is vkeybd.

Big parts:

Functional layered architecture?

layer 1: CPS? Render? Arrow? Monad? Timefun? layer 0: user should not use this directly tied to GHC; not portable but layer 0 can be replaced with external C code via FFI TODO remove reexports do not reexport from Sound DONE fillWithSine remove dependency on vector debug dump output raw file; load and plot with octave http://stackoverflow.com/questions/8264083/reading-binary-file-with-octave https://www.gnu.org/software/octave/doc/interpreter/Binary-I_002fO.html fopen and fread path = ‘/tmp/a.raw’; fid = fopen(path, ‘rb’); x = fread(fid, Inf, ‘double’); fclose(fid); y = x(1:2:end) + i .* x(2:2:end); plot(y) plot(y,‘.’) % scatter plot plotting? cairo? fltk? gtk? opengl? Sound.Ptr writeFile – use hPutBuf normal pdf with mean m and stdev s is exp (- (x-m)^2 / s^2) b-bandwidth of normal pdf with stdev s is 2 * x where: exp (-x2/s2) = b x2/s2 = - log b x2 = - s2 log b x = s * sqrt (- log b) for 16-bit audio, bandwidth is approximately 3 * s STFT? spread UNPACK pragmas in Sound.ALSA.Sequencer.Event midi event editor: http://sourceforge.net/projects/midiquickfix/ midi looper: midi manipulation functions; filterTrack; filterEvent; etc. TODO disuse attoparsec, cereal, binary wx doesn’t work with ghci; ghci dynamic linking not yet implemented ghc linker bug fixed in 7.8 http://marc.info/?l=haskell-cafe&m=138069502706021&w=2 gtk initialization is nonreentrant on ubuntu, gtk is easiest gtk (gtk2hs) works with ghci 7.6.3 hs-fltk or fltkhs?

FIXME
    Euterpea license is BSD3
    just use euterpea
        but it uses arrows

is midi clock mandatory?

DONE:
    Sound.Sequence: disuse StateT; just use lambda and IORef/Ptr

There are already many programs for specific purposes out there. To add value, they must be integrated? Must communicate with each other? http://www.giadamusic.com/ GPL; ugh

the soundbank is more important than the sequencer?

Programs

aseqdump is useful to check what an ALSA MIDI port receives.

monophonic sine synthesizer using ALSA sequencer.

We want to keep the code under BSD license but Haskell's MIDI package is licensed under GPL so we cannot use it here.

Alternative: GUI in Haskell, synthesis in C++ using STK, or even in C.

Arithmetic operation of the output of two generators is simple because we have defined a Num instance for IO a.

Sometimes an inhabitant of a -> G IO () is a sink. Can we use the pipe library?

Design decisions due to GHC

G IO a is just a newtype for IO a in order to avoid orphan instances.

A with-style function has to be INLINEd; otherwise GHC can’t unbox the argument of the continuation argument of that with-style function.

A function returning a lambda inside an IO (a function whose return type is like G IO (e -> G IO e)) has to be INLINEd; otherwise GHC will make a worker and thus it cannot unbox the argument of the lambda. There are two choices: G IO (a -> G IO b) and G IO a -> G IO b. GHC optimizes the latter better.

GHC does not have nested CPR analysis. GHC does not unbox the deep members of a nested tuple.

Features

Compute filter coefficients.

As long as it supports MIDI, it can run on different process. This facilitates Unix philosophy: write a program to do one thing well.

libsox uses LGPL and the Haskell bindings soxlib uses BSD. Use sox for writing files. Or use libav? libsndfile?

Can just use Nord sample library? Proprietary? License? Format? Scrambled? Encrypted? http://www.nordkeyboards.com/sound-libraries

Buffering?

We often want to generate samples into a buffer or write a buffer into a sink so we make these function:

fill :: IOUArray Int e -> IO e -> IO ()
forEach :: IOUArray Int e -> (e -> IO ()) -> IO ()

Relevant modules are Data.Array.IO, Data.Array.Storable, Data.Array.Unsafe.

The code is on GitHub. To clone it:

# HTTPS
git clone https://github.com/edom/sound.git

# SSH
git clone git@github.com:edom/sound.git

STK.

MIDI use JACK or PortAudio. Can be driven by VMPK or jack-keyboard.

Survey everything first.

http://genesisdaw.org/post/basic-synth-midi-keyboard.html

fluidsynth, soundfont

Requirement

Monophonic sine instrument that can be controlled from VMPK/jack-keyboard.

http://jackaudio.org/files/docs/html/group__MIDIAPI.html

How does Rosegarden and Ardour differ?

virtual MIDI button?

virtual MIDI knob?

FLTK-MIDI bridge?

If we are going to code procedurally, why bother using Haskell in the first place? Why don’t just use C++ and STK? Can the type system still help?

software MIDI controller? on hackage: alsa-gui

http://alsa.opensrc.org/AlsaMidiOverview vkeybd + fluidsynth

http://hackage.haskell.org/packages/#cat:Sound

Music production

Features

Record audio?

Compose from a combination of MIDI, synthesis, and recording?

Apply effects, equalizers?

Mastering?

The most commonly used features of a mixing:

http://jackaudio.org/applications/

Control uses MIDI.

After MIDI control and synthesis, suddenly music creation possibilities open up. Score representation uses MusicXML.

Piano roll to edit notes. We have LMMS.

The biggest issue of FOSS is usability (user experience). Personas: composer, sound engineer, who else?

vkeybd + hydrogen + qjackctl patchbay

JACK better mixer? Combine effect rack and mixer?

Beat editor may be unnecessary. We can control Hydrogen with MIDI.

Background

There is the synthesizer package in Haskell, but its license is GPL.

Libraries using GPL:

I could have used those libraries otherwise.

Libraries using LGPL:

I feel that decent performance can be obtained without having to resort to rewrite rules.

I want to make it as easy as possible for people to use my work so:

The Sound.Stream module should be reusable. It should depend on base only.

Write example first. Base examples on actual use cases.

Avoid context-carrying values. Prefer r -> a to R a. Prefer Rate -> L a -> b to Rated (L a) -> b.

Be practical. Avoid overzealous use of the type system.

Use IO a with IOError instead of IO (Either String a).

A stream (an infinite list) can be encoded as follows:

data G s a
    = MkG
    {
        _gs :: !s           -- starting state
        , _ge :: !(s -> s)  -- state stepping function
        , _go :: !(s -> a)  -- output mapper
    }

that is the generator encoding of the stream. It seems that GHC is able to optimize this well. For each s, it is hypothesized that there is an isomorphism between L a and G s a.

This is a new hope. The L a type turns out to be dead-end.

However, tuples may present a problem.

Decisions

Cereal is chosen over binary because the former supports big-endian 64-bit floats.

Milestones

Clean-up

Delete C, Rate, Rated, Slice.

Tracker

A tracker is pattern-driven sound splicing tool.

Prerequisites:

Example text timeline:

1 bar = 16 ticks
x = play
- = silence
| doesn't count; just helps readability

hihat   x-x-x-x-x-x-x-x-|
snare   ----x-------x---|
bass    x-------x-------|

hihat = [0,2..]
snare = [4,12..]
bass = [0,8..]

Running

Make sure that you have already had a Cabal sandbox.

cabal repl --ghc-options="-fobject-code -O"

That loads optimized compiled code in GHCi. (More details at Stack Overflow.)

File output

Use this function to write mono audio data into an AU file:

srlwritefileau :: SRL Double -> FilePath -> IO ()
lwritefileau :: Precision r p -> Count -> L Double -> FilePath -> IO ()

The library supports writing only 1-channel 64-bit float AU format, the simplest metadata-carrying standard audio format that Audacity can handle. This means we can see the signal on it and use its features such as spectrum analysis. Moreover, SoX can convert this format into any other audio format. This saves us from uninteresting work.

Representing signals

A signal has several representations.

type description
F T a function from time (second) to displacement
L a stream of sample values
V a unboxed vector
Tab a wavetable (a vector whose size is a power of two, containing one cycle of a wave)
RL a Rated (List a)
SRL a Slice Int (RL a)

Define a continuous signal like this:

x :: F T Double
x = fun (\ t -> sin (w * t))
    where
        w = 2 * pi * 256

Frequency modulation wavetable synthesis

rlfm :: Carrier (Tab Double) -> Modulator (RL Double) -> RL Double

These do-nothing type synonyms are meant to help documentation:

type Carrier a = a
type Modulator a = a

To make a wavetable:

tsin :: Lgsize -> Tab Double

tsin n creates a table with \(2^n\) samples. Usually \(n=12\) is good enough.

The types of rlfm and tsin are actually more general than those.

Sampling

Sampling transforms a continuous representation into a discrete representation.

lsample :: Rate -> F T a -> L a
rlsample :: Rate -> F T a -> RL a
vsample :: Rate -> Int -> F T a -> V a
tsample :: Rate -> Lgsize -> F T a -> Tab a

Why list is slow, weak head-normal stream is fast, and generator is even faster

I guess that list is slow because it has two constructors and it is not weak head-normal. GHC seems to only reliably unbox single-constructor weak head-normal data types. A weak head-normal stream type is therefore promising:

data L a = MkL !a (L a)

Most of the time, streams can be treated as if they were lists. These stream functions are usually named by prepending l to the corresponding function in Data.List.

lcons       :: a -> L a -> L a
literate    :: (a -> a) -> a -> L a
lzip1       :: (a -> b) -> L a -> L b
lzip2       :: (a -> b -> c) -> L a -> L b -> L c
lzip3       :: (a -> b -> c -> d) -> L a -> L b -> L c -> L d
lmap        = lzip1

Generators allow tight loops without constructors (the values are unboxed on entry to the loop).

Stream arithmetics

I overload arithmetic operators by defining a Num instance for L a.

instance (Num a) => Num (L a) where
    (+) = lzip2 (+)
    ...

The Num instance enables expressions like 2 * x and x + y, and any other arbitrary arithmetic expressions involving streams, where both x and y has the type (Num a) => L a.

To avoid name clash with Prelude exports, all stream function names are prefixed with l (lowercase L). I wish Haskell took types into account when resolving a symbol, like what Idris does. Doing that alleviates this problem greatly.

Continuation

To define a stream whose first n elements are taken from x and the rest are taken from y, we can use ltakeappend:

ltakeappend :: Int -> L a -> L a -> L a
ltakeappend n_ x_ y = loop n_ x_
    loop !n (MkL xh xt) =
        case () of
            _ | n > 0 -> MkL xh (loop (pred n) xt)
            _ -> y

We can generalize ltakeappend to continuation-passing style lctake. The continuation c_ decides what to do with the tail of the input list.

lctake :: Int -> L a -> (L a -> L a) -> L a
lctake n_ x_ c_ = loop n_ x_
    loop !n x@(MkL xh xt) =
        case () of
            _ | n > 0 -> MkL xh (loop (pred n) xt)
            _ -> c_ x

With lctake, we can define ltakeappend like this:

ltakeappend n x y = lctake n x (const y)

lctake allows building piecewise streams. The following stream.

s = lctake 10 x (lctake 20 y (lctake 30 z 0))

Since f x y = (f x) y (currying/schönfinkeling) and f (g x) = f . g $ x, we can do this:

s = lctake 10 x (lctake 20 y (lctake 30 z 0))
  = lctake 10 x ((lctake 20 y) ((lctake 30 z) 0))
  = lctake 10 x ((lctake 20 y . lctake 30 z) 0)
  = (lctake 10 x) ((lctake 20 y . lctake 30 z) 0)
  = (lctake 10 x . (lctake 20 y . lctake 30 z)) 0
  = (lctake 10 x . lctake 20 y . lctake 30 z) 0

s =
    lctake 10 x
    . lctake 20 y
    . lctake 30 z
    $ 0

This can be more tidily done with Cont.

newtype Cont r a = MkCont { runCont_ :: (a -> r) -> r }
instance Monad (Cont r) where
    return !x = MkCont (\ c -> c x)
    (>>=) m k = MkCont (\ c -> runCont_ m (\ !a -> runCont_ (k a) c))
clmap :: Int -> (a -> b) -> L a -> Cont (L b) (L a)
clmap f_ n_ x_ = MkCont $ \ c ->
    let
        loop !n x@(MkL h t) =
            case () of
                _ | n > 0 -> MkL (f_ h) (loop (pred n) t)
                _ -> c x
    in
        loop n_ x_
s = do
    clpass_ 10 x
    clpass_ 20 y
    clpass_ 30 z
    return 0