Embodied Voices – Autonomous Sensory Meridian Response, Auditory Illusions, and Virtual Vocaloid Idols

I’ve been interested in Autonomous Sensory Meridian Response for sometime but recently it has occurred to me that the term “ASMR” is not really common parlance; even at a recent sound symposium I realized that not many had heard of it, and even if the term rings a bell, perhaps for some it would be considered as verging on pseudo-science. But I feel there is something about it that is more than just a fluke.

Autonomous Sensory Meridian Response or ASMR has been a mainly youtube phenomena where people produce videos of “trigger” sounds which give some listeners a “tingly” feeling in the scalp or back of the neck. Some people may be more familiar with this sensation when at a hairdressers, when the sound of the approach of an electric razor on one side of your head gives you anticipatory “prickles” or “tingles” even before the razor has touched your head. Some other common trigger sounds are that of the crinkle of a paper bag, the sound of blowing in one’s ears, ear cleaning, the sound of hair being cut.

SOUNDsculptures – 3D Head Massage (No Talking)

SOUNDsculptures – 3D Hair Clipping (No Talking)

Deep Ocean of Sounds – 3D virtual laryngologist (No talking)
The level of professionalism in ASMR videos has risen sharply over the last year or so, with countless ASMR youtubers investing in serious binaural microphones – specialised ones which are spaced apart to reflect the distance between a human being’s two ears, so that what the microphones hear can be exactly that of what a human would hear. Some of the best ASMR video producers are practically 3d foley artists, and very good ones at that. There has been a great demand in developing virtual sound role plays, and a lot of them have become very sophisticated, and highly realistic. If you listen to them, you will swear that the distant hum of the plane was the distant hum of a plane in your own reality. This is much better than all those old and tired sounding “auditory illusions” that you can find on the internet.

DonnaASMR – Brushing the Microphone
I believe that there are sounds (being vibrations themselves) which transcend being just a sound experience but also produce physical sensations. For example, I enjoy standing next to a giant speaker in a club with a very low bass because it is more than just sound by that point, when the room around you is also affected by the vibrations. Similarly, I cannot tolerate spiciness in food, but I do enjoy it when food is so cold or hot or spicy to the point that it transcends a matter of taste and becomes an actual physical experience (of pain receptors responding in the tongue, of sweating from the heat, of numbness in the face and extremities from overstimulation).

I’d love to do more research into the phenomena of ASMR but it seems that one impediment to it being studied or taken seriously is its inherent association with pseudo-science or even the supernatural. In theory it bears some similarities to the idea of Electronic Voice Phenomenon and the notion of one being able to perceive voices or speech from electronic static and background noise. Not everyone perceives ASMR, so it is assumed by some that it is also because the individuals who do report it must be already predisposed to it or easily influenced into believing the sensation exists. Not too long ago I had a bizarre experience on Bus 9, where I was having a conversation with my classmate about virtual reality and I started to tell her about ASMR. Later she disembarked from the bus, and an african man came up to me from the back of the bus and told me sternly, “I HEARD WHAT YOU SAID JUST NOW. BE CAREFUL WHAT YOU SAY”, before getting off, leaving me very puzzled. What I eventually made of it was that even my objective layman’s description of ASMR could also sound a lot like a paranormal or odd spiritual/ghostly experience to someone who was inclined to read it in that way, so the man probably disapproved of my “meddling” with these unknown things.

I am obsessed about certain sounds used in electronic music, like white noise, crackling, static, pops, especially if accompanied by with very low moving drone basses. One of my favourite examples would be Fennesz’s Rivers of Sand (to be listened to with headphones). Actually, many things on Fennesz’s Venice and Endless Summer album have those sounds for me.

Fennesz – Rivers of Sand (From Venice, 2004)

Fennesz – Happy Audio (From Endless Summer, 2001)
I don’t need to listen to Diana Deutsch’s examples to understand how repetitive sounds produce certain auditory illusions after sometime (not when there is already music that illustrates it and is even more musical and evocative). But the point is that I wonder if there is scope for someone to make a music which makes uses of known auditory illusions, especially in the area of electronic music and experimental music today. AHHHHH IF ONLY I HAD MORE TIME TO MAKE MUSIC THESE DAYS!

鏡音リン – 炉心融解
Speaking of strange disembodied/embodied music, not long ago I also found a new video for Kagamine Rin’s Meltdown – a lot of it is coming out from Project DIVA which is a series of rhythm games; so I guess that explains why there was never an official video – since the “performance” of the song is the basis of the video game Project DIVA.

Kagamine was the second vocaloid program to be released and it consists of two characters, Rin (female) and Len (male) – a slightly lower toned female/young male voice. Back in 2011 I managed to see Hatsune Miku and Kagamine Rin at Anifest back in Singapore. This is an annotated picture of the performance I went to back then:

“Kagamine Rin” at Anifest 2011, Singapore
To be honest I am the sort of sap who is wont to overread into things or overdo it with a prac crit of what’s basically just a bit of fun, but I think the main reason that I’m still really attracted to tracks released by the virtual idol Kagamine Rin is because Kagamine Rin’s name 鏡音リン is derived from Kagami (鏡, mirror) and Ne (音, sound) and Rin (リン Rin / sometimes mis-transliterated into 鈴 Lin, bell). I mean, what better example could you have of a virtual idol, whose name even defines her as a sound mirror of reality? Where else can you see a concert in which kids come to see an original projection along with live event projections that inadvertently consist of hordes of kids cheering and watching a projection of something entirely virtual? This would be a perfect example of third-order simulacra (as according to Baudrillard).

Similarly, Hatsune Miku’s name, 初音ミク, is supposed to mean “first sound from the future” – Hatsu (初, first), Ne (音, sound), (Miku (ミク, like 未来 mirai). As a name it is much less interesting to me but also she is ridiculously high pitched and yes there is a limit to how much high-pitched singing I can appreciate in one sitting. (Kagamine Rin and Kagamine Len is typically less high-pitched)

It is a great example of how we expect sound to render or embody matter – because she sings, because she is in a program that you can use to make songs that “she” will “sing” for you, the character of Kagamine Rin has to exist. I really did not like Spike Jonze’s “Her” for being too literal and terribly flat (as if written by some dunce who didn’t even know how computers or natural language processing normally work), but I know it is very popular as a film simply because it plays on the idea of the disembodied, digitally generated human voice becoming virtually embodied into what we imagine to be a seemingly material being – just through the production of an artificial human voice. Even today, the ways in which actual living media personalities and celebrities exist and perpetuate themselves is in a manner that could very well be entirely virtualised. In contemporary culture, there is little difference perceived between the biological and digital body; after all, you only experience media. Kagamine Rin’s face can even change to look like ACTUAL EMOTICONS which people use to express emotion – and yes I’ve seen her face like this before –> ≧∇≦ ) So really, who needs a real physical idol?

Kagamine Rin
From “炉心融解”:

時計の秒針や [Tokei no byoushin ya]
テレビの司会者や [Terebi (Television or TV) no shikaisha ya]
そこにいるけど見えない誰かの [Soko ni iru kedo mienai dareka no]
笑い声飽和して反響する [Waraigoe houwa shite hankyou suru]

Second hand of a clock
The host of the TV
Someone who is there but invisible
The laughter becomes saturated and now echoes

The Excitable Dog’s Paradise Sensory Spa


It might have been G who told me first about it “Autonomous Sensory Meridian Response“, wherein small sounds, speech patterns, and tactile encounters are thought to produce “tingles” or braingasms in some people. You can read about people and their experiences with it all over the internet… Huffington Post: Nicholas Tufnell – ASMR: Orgasms for Your Brain, or The Independent: ‘Maria spends 20 minutes folding towels’: Why millions are mesmerised by ASMR videos

Yesterday I realised that there are lots and lots of videos on the internet of people roleplaying in spas, roleplaying haircuts, massages, and other things along with sounds like tapping, crinkling, brushing, and running one’s nails over an uneven surface. Believe me, its a whole cottage industry of youtube videos of people pretending to touch your ear and whisper to you sweet nothings while standing and whispering very very close to the camera. The roleplay is totally asexual. It’s really just what it is. Someone pretending to give you a haircut, and nothing more than that. Watching all these videos in which people specifically only make the video in order to showcase these small sounds, I’ve also realised that the reason why I like RRCherryPie’s videos of miniature fake food is because of the tiny tapping and crinkling sounds he makes with the packaging. Right. Not at all weird. Not at all creepy. And don’t doubt the power of ASMR. It hasn’t yet been explained scientifically yet. But some people do report that they get relaxation from the videos and sometimes even a tingling sensation in the scalp.

So a lot of my projects begin this way. I see something like GentleWhispering’s Paradise Lab, or WhispersUnicorn’s Haircut and Shave roleplay, and think “What a curious genre of youtube videos this is”. And I’m like, hold on a moment, couldn’t I also make my own ASMR video too?…

And so, now I have.

Welcome to the new luxury series of
The Excitable Dog’s Paradise Sensory Spa
Made specially for your relaxation…

In this video, I calmly brush the Excitable Dog with a fan brush, and then a nylon brush, and finally a large bristle brush. I got some of these brushes from Donna Ong so I don’t know what they were previously used for, but I’m sure they were used to brush nice artistic things, so they are brushes of nice but mysterious origin. I wonder if the Excitable Dog likes being brushed all over with mysterious brushes. After brushing the Excitable Dog, I also brush “you”, and part of the table, and also eventually I brush the other brushes. Its a brushapallooza. Insert apology for having an excitable voice. Why am I even brushing my toy? Who is watching me brushing my toy? Will anyone get funny feelings from watching me brush my $2 daiso toy? Are there really lots of people out there obsessively googling for ASMR videos so they can get to bed? I don’t really know. Let me know or leave a comment…

I’ve made another video for G to learn some random chinese words with. So here it is…

Sonic Visualiser and Signal Processing

Viewing audio spectrums in audio editing programs like Audacity can be unpredictable as it may frequently crash or hang because the program/computer can’t handle the processing required to analyse and display the spectrogram. And after you get the graph, what do you do with it? After the data is visualised, how can we get to the data and break it down?

Sonic Visualiser is apparently a program explicitly built for viewing and exploring audio data for semantic music analysis and annotation. I downloaded it recently and found that it worked incredibly fast and was also full of many useful annotative functions and could also run “feature-extraction” plugins (eg: beat trackers, pitch detection, etc).

I guess I am following this line of thought because I am interested in how we can analyse sound data meaningfully. Detecting beats, pitch, vowel sounds, and other audio features is something that has fascinated me since I once saw a documentary about deaf children in 1970s France, where children were apparently trained to speak using a computer game that made children learn the subtle difference between speaking different vowel sounds. Although the children in this programme were profoundly deaf (often from birth) and could not hear anything at all, they were being physically trained to produce the right vibration and sound through their vocal chords, aided with this motivational computer game that moved the character up/down/left/right according to the sound that was emitted. The vowel sound for “A” would move it up, the vowel sound for “E” would move it to the right, the vowel sound for “U” would move it down, and so on so forth. So it would be, a sort of dream, to find out how to create such a program on my own.

I downloaded Sonic Visualiser and put in an audio file generated by Metasynth which was meant to have a spectrogram that resembled the cover of the Space Voyager record. (Read more about my attempts at converting images to sound)

Picture 3
dBV: “The scale displacement is proportional to the log of the absolute voltage.”

Picture 4

dBV^2: “The scale displacement is proportional to the log of the square of the bin value.”

Picture 5

Linear: “The scale displacement is proportional to the voltage level of the resulting audio output.”

DECIBELS: The dB is a logarithmic unit used to describe a ratio of a physical quantity in reference to a specific level. The ratio may be power, sound pressure, voltage or intensity or several other things.

“In professional audio, a popular unit is the dBu (see below for all the units). The “u” stands for “unloaded”, and was probably chosen to be similar to lowercase “v”, as dBv was the older name for the same thing. It was changed to avoid confusion with dBV. This unit (dBu) is an RMS measurement of voltage which uses as its reference 0.775 VRMS. Chosen for historical reasons, it is the voltage level which delivers 1 mW of power in a 600 ohm resistor, which used to be the standard reference impedance in telephone audio circuits.

My Observations/Ponderings:

  • are those what we call “noise artifacts” in the spectral analysis process? why are there more colours on certain “scales”? why the fuzzy bits of sound scattered across the graph from what sometimes sounds just like a singular tone?
  • how should we choose a frequency scale? which frequency scales bring out the most striking visual images? is this like the RGB channels for images? in RGB images we can say that the red channel tends to contain the “human skin tones”, the green tends to contain the high details, and the blue tends to have the noise – is there a similar thing in sound analysis, where viewing sound on different scales produces visual graphs that emphasise particular details in a similar manner?
  • how is colour assigned? there are many different possible colour palettes available, but how do these programs do it? are there also different scales involved in applying colour effects to the sound spectrogram?

Picture 6

for example, this is a portion of the analysis of dopplereffekt’s the scientist. the track itself is fairly minimal and it is very crisp and clean, mostly just DUK CHK DUK CHK DUK CHK DUK CHK. still, this spectrogram is pretty colourful and i can’t quite yet look at the spectrogram and imagine the sound from it. perhaps it will take more reading to understand why it looks this way or how to optimise and format the spectrogram output.

In other news, I am currently trying to understand digital sound processing by watching this lecture series on Signals and Systems, released on MIT Opencourse Ware. Got to Lecture 2, was promptly stumped by the first equation. First time I saw the symbol phi. Actually, how the hell do you even type phi? Looks like there is no way to type phi in a mac keyboard. You have to copy paste it in or find the symbols panel (greek) and then add it in. ϕ

The lecture begins talking about a continuous time sinusoidal signal but in the real world a lot of the common digital signal processing that goes on apparently involves discrete signals, eg: music on cds, mp3s.

Signals can either be continuous time (eg: analog) or discrete (eg: digital). Digital audio is sampled and the sampling rate determines how many of these discrete signals we record down.The resolution determines how “detailed” the recorded signal can be. An 8-bit code has 256 possible combinations, a 16-bit code would have 65,536 combinations.

Next, this data has to be analysed so it makes sense as sound. There is no “simple” way to do spectrum analysis for sound and most oscilloscopes do not give any information about the timbre of sound which needs to be understood more as differences against the scale of frequencies whereas we record sound against the scale of time with an oscilloscope. The main method used to analyse sound into something that becomes “digital sound” is the “Fast Fourier Transform” which appears to involve a very much more complicated algorithm….

Converting Image to Sound – Spectrograms

The Voyager Golden Record was a phonograph record that was sent out along with the Voyager spacecraft in 1977, with a selection of images and sounds that were supposed to let intelligent extraterrestial life understand what humans were like. This is the cover of the record, which shows a diagram of how the phonograph record works and how it is supposed to be read/played back.

MetaSynth Screenshot Voyager Golden Record

I ran it through MetaSynth to see what it would sound like. MetaSynth is a brilliantly made and well-documented application that many have used as a tool for generating and sculpting sounds from images and this idea is not anything new (famous examples being afx’s windowlicker and venetian snares’s look (from songs about my cat). Surprisingly, right out of the box, the circular curve of the Voyager’s cover produced a futuristic sound, not unlike what you would expect to hear in a scifi film.

voyager waveform

This is what the waveform looks like. But with the free sound editor Audacity (available for both mac, linux, windows and other platforms), you can also view the sound spectrum.

Picture 20

Right-click the title of your track to see this additional dropdown menu.

voyager metasynth spectrogram

This is what the sound spectrum looks like. You can see the lower half of the golden record’s image appearing in the spectrum. This spectrum represents the spectral density of the signal. Similar spectrograms are also generated and analysed in the fields of radar/sonar, speech processing, and in seismology.

Horizontal axis – time
Vertical axis – frequency
Intensity/Colour – amplitude of frequency

When the Voyager was sent out, then-US president Jimmy Carter’s message was as follows:

We cast this message into the cosmos. It is likely to survive a billion years into our future, when our civilization is profoundly altered and the surface of the Earth may be vastly changed. Of the 200 billion stars in the Milky Way galaxy, some — perhaps many — may have inhabited planets and spacefaring civilizations. If one such civilization intercepts Voyager and can understand these recorded contents, here is our message:

This is a present from a small distant world, a token of our sounds, our science, our images, our music, our thoughts, and our feelings. We are attempting to survive our time so we may live into yours…

The Voyager’s message was not sent out for anyone in particular, and there is no certainty that any intelligent life will ever encounter it or have the intelligence to decode it. So why did we still send out the Voyager?

Before the telegraph and telephone and the internet and everything, we might equate writing to being an “artifact” or by-product of our verbal communication. But the forms of communication get more and more complicated, the nature of communication is changing. According to Derrida, all forms of discourse are “telecommunication”, in that they are predicated on the possibility of the radical absence of both the producer and the receiver of the communication.