summaryrefslogtreecommitdiff
path: root/stories/projects/timestretch.rst
blob: 15ce5ca091f53f775bab586a89572d77ab9ac8c2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
.. link:
.. description: is a fast implementation for SOLA, a sample time stretching algorithm.
.. tags: project
.. date: 2015/02/01 19:10:08
.. title: timestretch
.. subtitle: a fast sample time stretching implementation
.. slug: ../arts/software/timestretch/index

.. contents::
  :depth: 1
  :class: ezjail-toc

Overview
========

When reducing or increasing the playback speed of an audio recording, it's pitch usually changes, leading to an audible "mickey mouse" effect. The `SOLA algorithm <http://en.wikipedia.org/wiki/Audio_time-scale/pitch_modification#SOLA>`_ provides a way to change the speed without altering the pitch.

----

Details
=======

The basic idea behind the method lies in the inherent redundancy in the periodic waves of spoken words or music. A vowel consists of overlayed and repeating waves where a human ear would not mind a single one of those repetitions being there or not:

.. image:: timestretch_1.png


If we want to play a sample faster, we can try to find the period length corresponding to the base frequency of the recording in that place and overlap some periods, cross fading the first snippet one into the other. Or put it more visually intuitive:

.. image:: timestretch_2.png


Obviously the recording is shorter than before, meaning it can be played in less time. In order to reduce the play time by a fixed ratio, we can alter the length of the overlapping windows (bright blue).

To find the perfect offset to overlay our samples at, we simply brute force the sum of all differences of all samples for each offset, using the `mean squared error <http://en.wikipedia.org/wiki/Mean_squared_error>`_ and selecting the offset where the error is minimal – a process commonly called auto correlation. In our implementation this error is further biased towards the center of our window, so that the algorithm is not forced into selecting some sub optimal (and audibly bad) positions after a while, if the inherent period length the overlapping length differs only slightly.

----

Build instructions
==================

*timestretch* is available from my git repository. Use ``git clone git://erdgeist.org/timestretch`` to check it out. An `timestretch gitweb </gitweb/timestretch/>`_ is available.

Currently there's only one source file containing the setup routine ``calc_convert_values``, which takes sample rate (used to calculate a proper window length in ms, based on heuristic values) and a tempo, which is a floating point value giving the rate in which to slow down or speed up the recording. It fills global variables you might want to put in a context struct for your project. At exit, the value ``g_input_length`` is the minimum number of samples expected for later processing, ``g_output_length`` is the exact amount of samples produced by each run.

The actual work is done in the ``process_frame`` function which takes a pointer to the input and output buffer, the pointer to some scratch space (set up in ``calc_convert_values`` in this implementation) and a ``frame_flag`` indicating if this is the first frame (where nothing is there to cross fade with), a normal frame and the last frame, where no samples are kept for later cross fading and the caller can continue to resume to non-timestretched playback again.

----

Author
======

*timestretch* was written by `Dirk Engling <mailto:erdgeist@erdgeist.org>`_, who likes to hear from happy customers.

----

License
=======

*timestretch* is considered `beer ware </beerware.html>`_.