The Covid-19 Listening Project is a collaboration with audio researcher and programmer Dr Enzo De Sena (University of Surrey) with consultation with Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy) and Niki Loverdu (KU Leuven, Belgium). It aims to reveal and communicate characteristics of the Covid-19 genome sequence – and its mutations and strains – through data sonification. The project has been featured on Italian TV’s prime-time current affairs programme DiMartedi hosted by Barbara Gallavotti (broadcast on May 19 2020) to ≈3 million viewers, and in Metro London on November 13 2020 (≈1.3 million circulation). The sonification process, which allows the spike protein, mutations and the several strains are clearly audible is detailed below.
The initial process assigned the types of mutation and their position in the genome as – respectively – chromatic notes in metric placement. This is detailed in the report below. This system was employed in the creation of a choral work Chorus of Changes. Here, over 500 genome sequences are translated into two octaves of a B minor scale. The translations are selected by mapping the most common mutations types (‘note deltas’) into the most common diatonic scale degrees on a sample of Western Art Music (see Huron 2008) [6] using the DataLoop Crypto device. This results in familiar melodic motifs for the most commonly retained mutations and pandiatonic blurring for the more novel mutations. At the tempo selected this results in a surprisingly engaging piece of music lasting over 42 minutes, where the language of mutation is translated into the language of motivic transformation, a deeper sonification beyond arbitrary chromatic or ‘safe’ scale choices. This is performable by choir and organ but is here rendered with MIDI instrumentations in Ableton Live with UAD and Native Instrument plugins.
Data gathering
The project uses the database of the National Center for Biotechnology Information (NCBI), which is being updated every day with new COVID-19 sequences coming from research centres across the world. The website is available here: NCBI Covid-19 page
The data is parsed and downloaded using the covid-genome-matlab-parser.
How the mutations are obtained from the NCBI dataset
The first step is to obtain the mutations are obtained from the NCBI dataset:
- The genomic comparison is run on the basis of the nucleotide sequences (in GACU symbols).
- In order to reduce the amount of data (currently over 700 genomes), only the first measurement of a day is considered; this currently reduces the number of genomes to 74.
- The comparison of genome pairs is carried out using the Needleman-Wunsch global alignment algorithm [1], which allows to identify the genetic mutations.
- The mutations are considered starting from the beginning of the first protein (NSP1), i.e. at nucleotide n. 266 of the NCBI accession MT019529. Mismatches in the last 100 bases are also ignored (for reference, the RNA of COVID-19 is about 30k bases long).
How the mutations are translated to music
The second step is to translate the mutations into music. Below are two examples of how to do this.
Below are the details of the procedure to generate the sound from the mutations:
- The nucleotide mutations are translated into notes using the table below; a constant value is added to all of them; the – sign indicates an insertion or a deletion; all other types of mutations are ignored.
Old basis | New basis | Midi note | Note |
C | – | 47 | B2 |
U | – | 48 | C3 |
A | – | 49 | C♯3/D♭3 |
G | A | 51 | D♯3/E♭3 |
G | U | 52 | E3 |
G | C | 53 | F3 |
A | G | 54 | F♯3/G♭3 |
A | U | 55 | G3 |
A | C | 56 | G♯3/A♭3 |
U | G | 57 | A3 |
U | A | 58 | A♯3/B♭3 |
U | C | 59 | B3 |
C | G | 60 | C4 |
C | A | 61 | C♯4/D♭4 |
C | U | 62 | D4 |
– | G | 63 | D♯4/E♭4 |
– | A | 64 | E4 |
– | U | 65 | F4 |
– | C | 66 | F♯4/G♭4 |
G | – | 50 | D3 |
- The mutations are then organised in 8 groups according to the table below.
Protein name | Group | Instrument | Angle |
NSP1 | 1 | Cello | -45° |
NSP2 | 1 | Cello | -45° |
NSP3 | 2 | Cello | -30° |
NSP4 | 3 | Cello | -15° |
NSP5 | 3 | Cello | -15° |
NSP6 | 3 | Cello | -15° |
NSP7 | 3 | Cello | -15° |
NSP8 | 3 | Cello | -15° |
NSP9 | 4 | Cello | 0° |
NSP10 | 4 | Cello | 0° |
NSP12 | 4 | Cello | 0° |
NSP13 | 5 | Cello | +15° |
NSP14 | 5 | Cello | +15° |
NSP15 | 5 | Cello | +15° |
NSP16 | 5 | Cello | +15° |
S | 6 | Double base | +30° |
ORF3a | 7 | Violin | +45° |
E | 7 | Violin | +45° |
M | 7 | Violin | +45° |
ORF6 | 7 | Violin | +45° |
ORF7a | 7 | Violin | +45° |
ORF8 | 7 | Violin | +45° |
N | 7 | Violin | +45° |
ORF10 | 7 | Violin | +45° |
Non-coding DNA | 8 | Violin | +45° |
- In order to facilitate spatial discrimination, each group is rendered in a different direction, as indicated in the table above. This specific separation corresponds to the actual lenght of each protein. In other words, this results in a similar experience you’d have if the RNA was actually rolled out around you and each mutation would reproduce a sound from a corresponding direction. This means that not only one can identify the position of the mutation by using the time delay, but also the perceived position of the sound in space, with mutations closer to the beginning of the sequence appearing to your left, while later ones appearing to your right. The specific spatialisation technique used here is Perceptual Soundfield Reconstruction (PSR) [2,3,4].
- In order to increase presence, each source is rendered with reverberation using Scattering Delay Networks (SDN) [2,5]; the simulated room is rectangular with size 10 m x 10 m x 3 m. The listener is approximately in the center of the space, and the sound source is 2 meters away from the listener.
- In order to increase presence, each source is rendered with reverberation using Scattering Delay Networks (SDN) [2,5]; the simulated room is rectangular with size 10 m x 10 m x 3 m. The listener is approximately in the center of the space, and the sound source is 2 meters away from the listener.
- The (informal) protein labels are taken from the NYT article in [7].
Authors
- Enzo De Sena
- Milton Mermikides
Department of Music and Media, University of Surrey, Guildford, UK
Acknowledgements
We would like to thank Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy) and Niki Loverdu (KU Leuven, Belgium) for the useful discussions on topic. Also, we would like to thank our friends and colleagues for the useful feedback on the presentation.
The project also uses internally:
- Yauhen Yakimovich’s yamlmatlab
- Ken Schutte’s MIDI tooldbox
References
[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).
[2] H. Hacıhabiboğlu, E. De Sena, Z. Cvetković, J.D. Johnston and J. O. Smith III, “Perceptual Spatial Audio Recording, Simulation, and Rendering,” IEEE Signal Processing Magazine vol. 34, no. 3, pp. 36-54, May 2017.
[3] E. De Sena, Z. Cvetković, H. Hacıhabiboğlu, M. Moonen, and T. van Waterschoot, “Localization Uncertainty in Time-Amplitude Stereophonic Reproduction,” IEEE/ACM Trans. Audio, Speech and Language Process. (in press).
[4] E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković, “Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction,” IEEE Trans. on Audio, Speech and Language Process., vol. 21 , no. 8, pp 1653-1665, Aug. 2013.
[5] E. De Sena, H. Hacıhabiboğlu, Z. Cvetković, and J. O. Smith III “Efficient Synthesis of Room Acoustics via Scattering Delay Networks,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 23, no. 9, pp 1478 – 1492, Sept. 2015.
[6] Huron, D. (2008) Sweet Anticipation: Music and the Psychology of Expectation. MIT Press
[7] https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html (Accessed on: 8/3/2020)