Novation’s Head Of Innovation Dave Hodder Taps Into Circuit Session Data To Discover Your Musical Trends
As Chief Scientist at Novation, it’s my job to discover as much as possible about our products and the ways people use them. The most common approach to this is through our comprehensive UX (User eXperience) process, which is an important part of the product design phase. We also get feedback from the customers who have bought the gear, and this anecdotally informs us about some of the ways people use our kit. Our Tech Support team also play an important role, identifying areas for improvement, and spotting trends in product pairings, for example. Obviously, hanging out with the lovely folk of the Circuit Owners Group also gives a window into the genre choices of our user base, but I wanted to try a different approach by channeling my inner data nerd.
So for my most recent deep dive, I thought it would be fun to use our Circuit Components data set to see what kind of music our customers are making. In particular, I wanted to know what we could learn from the vast number of Sessions you have uploaded to Components over the last two years. Would there be surprises? Would I find you all making the same beats? The results, as you’ll see, are fascinating. Hopefully they will serve to inspire you to try something new with your Circuit!
Using Your Data (Important)
First, I needed to get hold of data. Your data. Now, before we go any further, I must caveat a few things. Here at Novation, we take privacy very seriously, and I should state that we do not have the ability to ‘hack into’ your Circuit and listen to your creations at will. All of the sessions are anonymised when we extract the information, and all the data I exported has not left my secure hard drive in the office at Novation HQ. What’s more, in the process of this analysis, I haven’t loaded a single user preset or session onto a Circuit or listened to them in any way; my only area of interest has been the raw data compiled in shared Sessions. Lastly, once I completed the analysis for this experiment, I securely deleted the information.
Getting The Good Stuff From Raw Data
Circuit sessions are stored in an Amazon S3 bucket, so I used a script to download all 50GB on to a non-network external drive. I ran the download overnight and by the morning I had more Circuit Sessions than I could ever have wanted — well over 200,000!
What I had now was the raw information from which I could extract the specific data I was interested in. The next step was to write a C++ program (allowing me to reuse some of Circuit’s own code) to unpack these SysEx sessions and dump out some promising parameters in a format I could use. After the usual pain and debugging, I ran my script over the whole data set. An hour later, I had a CSV file containing tempo, swing, scale and root note for each Session in the set.
A few short lines of code later, and I had the stats to make my first two graphs. The initial results were confusing; it seemed that Circuit customers were all making music with the same rhythms, tempo and melodic content. After a brief moment of head scratching, I realised I’d forgotten to remove the default content, which makes up a large proportion of the backups we store; obviously this skews things horribly. Back to C++ land (because, argh) and some hacking later, I was confident my data didn’t include the default sessions. (Although it does contain things that are very slightly different from the default sessions. See if you can spot them.)
Bringing Data To Life
With all the code tidy, debugged and free of factory defaults, I could finally create some visual representations and bring the raw data to life! This first image only shows rhythm and tempo information: no melodic stuff. Tempo runs along the X-axis (horizontally), and swing is on the Y-axis (vertical). Remember that swing amount is a bi-polar figure, with the zero point represented by the number 50 on Circuit. I wanted to make the charts pretty, so I didn’t label the axes, but I bet you can guess the values by looking at the data.
(If you’re stuck, the horizontal band is 120bpm, the default tempo in Circuit Sessions; the vertical bar represents a swing amount of 50, also default setting in every blank Circuit Session.)
We do love our tempi to be multiples of ten! And some of you really like to explore the limits — shoutout to those of you writing music at 240bpm, maximum negative swing! And, is it just me, or if you blur your eyes a bit can you see a guitar?
The tonal data — key and scale information saved with the Sessions — needed a different tool to be able to display it in an easy-to-understand way. Scales lend themselves to circular plots, rather than an X-Y graph, so I reached for another nice data-viz tool called D3.js which, thanks to its active online community, has a visualisation preset for pretty much any task. So I found one that would work well to show this information (cheers Peter Cook!) and dropped in our data. What it shows is a wheel with 16 concentric rings (one for each of the scales on Circuit). The inner ring is chromatic, and the outer circle is a minor key (the default), and they’re ordered just like they are in the manual (see page 25). You can see that all the C-rooted scales are fairly popular, with F (Harmonic Minor) and E (Dorian) showing prominence (I’d love to dig into what makes these outliers popular). You’ll also observe that, if you want to sound unique, choose a Hungarian Minor scale, preferably in Bb, at around 155bpm with a swing setting of 65! You’ll be the only one doing it!
By this point I was high on data. I thought I could do anything, so I set about visualising the rhythmic trends of the patterns programmed on all four drum tracks of the 200,000 Circuit Sessions I had data for. On one plot. (Yes, I’m a wild kinda guy!)
One way to represent a beat is as a point in a 64-dimensional space: one dimension for each of sixteen steps in the sequencer, four times over (for each of Circuit’s drum parts). This approach is not immune to problems — firstly, if two identical grooves were offset by just one step, they would be displayed totally differently in this plot. So we have to assume that that eventuality is rare. Secondly, it’s difficult to envisage a 64-dimensional space using our three-dimensional brains. (It’s easy if you’re a mathematician — just visualise an n-dimensional space, then let n=64. Sorry, no more maths jokes.)
Thankfully, it turns out lots of clever people spend a lot of time doing dimensionality reduction. One of their inventions is the lovely t-SNE. It munges away all those unfathomable dimensions and makes two new ones that cluster the data in a way we can draw. Having removed duplicate beats, I was left with 29331 unique drum patterns. The t-SNE took about ten minutes to create, giving me a delightful petri-dish of grooves.
Making Sense Of The Blobs
How can we interpret this? Well, like is close to like, so the clumps do represent groups of similar grooves. Bear in mind that Circuit allows you to put kicks on any channel, snares on any channel and so on. We’ve unfortunately lost that information, so these groupings may not have sounded alike when they were created. Regardless, there’s definitely structure.
But what does it sound like? One way to answer this is to look more closely at some of the clusters we can see. How about that dense blob in the centre of the image? There’s several hundred beats here — let’s take the ‘average’ beat from that group and have a look.
These are the mean velocities for each step in the sequence:
BD: 92 03 10 10–01 12 09 05–96 01 05 12–02 11 11 08 SD: 00 00 00 01–93 00 00 01–00 00 01 01–95 00 02 01 CH: 03 08 89 07–05 05 94 06–02 05 89 07–07 20 83 15 OH: 04 02 04 04–02 04 07 00–03 03 04 05–01 04 18 02
There’s no action on the open hats, so all we really have is:
BD: X 0 0 0 X 0 0 0 SD: 0 0 X 0 0 0 X 0 CH: 0 X 0 X 0 X 0 X
That lump at the bottom in the middle sounds like this:
BD: X 0 0 0 0 0 0 0 SD: 0 0 0 0 X 0 0 0 CH: X X X X X X X X
One problem with this analysis is that it discards information about the popularity of particular beats. If we plot the size of each beat by it’s popularity, we get a mess. This is because the most popular beats are much more popular than any others. Using a log scale, we can see a bit more structure emerging, without fixating on the top ten or so beats:
See how that large blob on the right hand side has grown? This is because there are some very similar, very popular beats there (hint — minimal techno). In the previous plot, that just appeared as a small but bright dot. Here, it’s significance is better represented.
So, ‘what are the next steps with this visualisation project?’, you ask. Well, I just had to hear what these beats sounded like. So I started work on an interactive map, using Web Audio (and the wonderful Tone.js) to play back beats in real time — you should check it out! I chose to interpret the average hit values as probabilities, so there’s some nice variation as the grooves play back. My favourite is probably cluster #7. Note that the beats played are ‘averages’ of hundreds of beats in each cluster, rather than actual patterns taken from your collective Circuits. So if you hear your exact beat, it means you’re completely average. (Condolences!).
Jokes aside, I hope this experiment will inspire you, the Circuit customer, to push your groovebox to the limit, and try making music with new and different rhythmic patterns and scales. Next time you have writer’s block, try making a track using one of the least common tempo, swing, root and scale combinations that you can identify in the visualisations, and see where your musical adventures take you! Of course, don’t forget to share your experiences on the Circuit Owners Group!