Also: can mp3s be totally represented as a text file?
Everything is just vibrations; vibrations are just waves. Vinyl records make a physical copy of the sound wave. As the needle drags across it at the correct speed it starts vibrating, reproducing the sound that went into the groove in the first place.
Sound files are more tricky: Basically you need to measure the wave as it goes up and down, store it into a file, and then have a computer convert it back into vibrations with the help of a loudspeaker. The more times you measure the sound wave per second, the better quality your recording will be.
As we know that sounds are waves, it’s not so hard to imagine a text file containing sound. Below is a very simple wave form represented with numbers:
- _ - ⁻ - _ - ⁻ - _ - ⁻ - _ - ⁻ - _ - ⁻ - _ - ⁻ - _ - ⁻ - _ - ⁻ - _ _ - - ⁻ ⁻ - - _ _
1 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0 1 1 2 2 1 1 0 0
In theory, a computer could convert this into sound. It would sound awful.
While it sounds simple enough to say ‘waves captured in media reproduce sounds’ it’s still pretty mind bending to note that we can capture all the distinct parts of a symphony in that one wiggly little groove of a record.
That is because of linearity/superposition - look up both of those concepts for more info. Basically, you can add as many sounds together as you’d like and it is the same as a single, different sound - so if you can recreate the one sound, you can recreate all of them.
The speaker doesn’t know “this part of the wave is from the drum and that part is from the guitar”, etc
Which is still mind blowing in its own right. What’s incredible in the end is not so much the process of sound recording as the nature of reality itself.
For this part of the equation I find it useful to think of two tin cans with a string between them, which is a perfectly capable microphone/loudspeaker set-up.
Which in turn makes sound isolation even more mind-blowing!
“this part of the wave is from the drum and that part is from the guitar”
So is that information deduced by the listener’s mind, as opposed to their ear? Kind of like depth perception in the field of vision?
In short, yes.
Ear as an organ is not a simple microphone though. It’s partially evolved in a way to help humans recognize ‘the music’ and ‘the speech’. The brain does the heavy lifting.
It may help to remember that you only have two ears to hear that entire orchestra. So with only two speakers, you can reproduce said orchestra. By the time it makes it to your ear, the sound vibrations aren’t a bunch of separate waves; They have combined into a compound wave, with all of the constructive and destructive interference that goes along with that. What you hear is the sum of all the sound waves around you, not each individual source of the waves.
The Fourier transform is a way to convert from a raw waveform into all of its distinct parts, and back again. Computers do this for some types of sound data compression, such as MP3. Your ears do this (in the cochlea) when converting the physical movement of sound waves into electrical signals for your brain.
The nyquist shannon sampling theorem is the idea that as long as you capture ‘samples’ of signal (like the air pressure over time that is sound) faster that twice the period of the highest frequency component of the signal, all the information is captured and the aignal can be reproduced exactly (in theory).
maybe think of it like looking through a fence. As long as nothing on the other side of the fence is shorter than the spacing between posts, you can everything on the other side.
note that any file can be encoded as text with something like Base64
This does involve a cutoff point in the “maximum” frequency of a signal. Actual sound contains a much larger frequency spectrum than our audio file formats are designed to handle.
Actual sound also contains a larger frequency spectrum than our ears are designed to handle. Not throwing shade, our audio file formats also contain a narrower frequency spectrum than we can hear generally speaking.
I speak incompletely because I expect others to argue and complete the conversation. No shade perceived at all.
Sure but human hearing tends to cap out at around 24khz so a sample rate of 48khz is going to contain everything that we can hear.
Tends to, key word. Some are outliers in every dimension, including maximum perceptible frequency.
I love the fence analogy.
removed
Professional audio technician, checking in. So there’s a few big questions here, all layered on top of one another. And honestly, each one would be worthy of its own post. Let me break them down and answer them in order of complexity. I’ll gloss over some stuff and leave out a lot, but I’m trying to hit the basics so you have an understanding of the fundamentals.
First off, we need to understand what sound is. It’s a physical compression wave. Imagine stretching out a slinky, then gathering a few coils on one end and letting them go. You’d see a physical wave travel down the coils, with a compression followed by an expansion. And sound works much the same way. When you make a vibration, air molecules get compressed and expanded, creating teensy tiny pockets of high and low air pressure. This compression travels through the air as a wave, radiating from the source of the vibration. Higher pitched sounds have a faster vibration, while lower pitched sounds have fewer vibrations. These vibrations are measured in Hertz. And physical waves carry energy, the same way ocean waves or a newton’s cradle does. Sound waves don’t typically carry a lot of energy, but it’s enough to wiggle sensitive things (like our eardrums) around just a little bit. And since it has energy, we can capture that energy in various ways.
So if we can find a way to capture that energy in some way, we can convert it into other types of energy. The most primitive early records were actually made out of soft wax. Then the recorder was basically a giant horn connected to a suuuuper tiny needle. As they dragged a needle across the wax, they shouted into the horn, (they had to shout because louder sounds have more energy, and their primitive recorder wasn’t sensitive enough to pick up quiet sounds.) The horn collected all of that sound energy, and focused it down into the tip of the needle. The same way a trombone takes a player’s vibrating lips and expands it into a massively loud sound, the opposite also works and you can use horns to focus sound into small receivers. The needle would vibrate from the sound waves’ compression and expansion, and cut a groove into the wax.
Then playback was the inverse operation, where they dragged the needle across the wax groove, it vibrated as the groove wobbled, and those vibrations were expanded by the horn. Later iterations improved the design and sensitivity, and they quickly swapped to vinyl because it’s more durable than wax. And now we have a record player. That’s a sound wave captured in physical form.
Next, let’s talk analog electrical audio. So we know that sound waves have energy, and they can wiggle things around. So what if we had a way to capture those vibrations and turn them into an electrical energy instead of physical energy? That’s what a microphone does. The most basic microphone is basically just a magnet and some copper wire, attached to a diaphragm. When you have a copper coil and move a magnet through it, it creates an electrical charge. Move it one direction through the coil, you get a positive charge. Move it the other direction, and you get a negative charge. So what if we find a way to move a magnet based on a sound wave?
Let’s take a really sensitive diaphragm. Sensitive enough to wobble when sound waves hit it. Because remember, it’s just a wave of high and low air pressure, so it can blow and suck on a diaphragm the same way wind blows on a sail. So we make a sensitive diaphragm, which wiggles in relation to the air pressure. When a sound wave hits it, it vibrates in response. Now we attach the magnet to it, so it wiggles the magnet back and forth. Now we have a basic microphone. When sound hits the diaphragm, it wiggles which moves the magnet, creating positive and negative electrical charges in the copper wire, which directly correspond to the sound wave. Congrats, we’ve just invented something called the dynamic microphone. (There are other, more complicated types of mics, but they all do the same basic task of capturing that sound wave and converting it to electricity.) So now we have an analog electrical signal. And now that it’s on copper as electricity, we can use electronics to amplify it and send it to a speaker. A basic speaker does the exact same thing as a dynamic microphone, but in reverse. It has a magnet and copper coil, attached to a horn-shaped cone which can wiggle back and forth. When you run an electrical charge through the copper, the magnet moves in response. So if we send that analog audio signal (amplified to be powerful enough to drive the speaker) to the speaker coil, it will wiggle the attached speaker cone forwards and backwards, and produce a vibration that matches the signal the mic captured. (Tangentially, you can actually use a speaker as a microphone in a pinch. Since they’re doing the same basic thing in opposite directions, you can plug your headphones into a mic input and yell into it, and the tiny speaker drivers in your headphones will act as a mic diaphragm.)
But how do we capture that analog electrical signal, and save it as a digital file? There’s something called the Nyquist–Shannon sampling theorem, which comes into play. Basically, the theorem states that any wave can be perfectly sampled and reproduced, as long as the sample rate is at least two times the maximum frequency of the wave.
Let’s break that down. First off, what is a sample, and sampling rate? The computer doesn’t just listen to the constant stream of analog audio and record it directly. Instead, it samples the analog wave at extremely precise, regular intervals. So for each sample, it checks to see what the electrical charge is. It records the wave’s amplitude and polarity at that specific point in time, then saves just that.
And according to the theorem, it needs to do that at least twice as often as the maximum expected frequency of the wave. Generally, the human hearing range is considered to be 20Hz (20 vibrations per second,) to 20KHz, (20,000 vibrations per second.) Some audiophiles say they can hear more than that, but we’re sticking to the basics here… So according to the theorem, as long as we have a sample rate of at least 40KHz, we should be able to accurately reproduce any audio wave in the human hearing range. So we’re sampling the wave at least 40,000 times per second. That sounds like a lot, but remember that each sample is relatively small because we’re only saving a point on a graph.
Lastly, let’s go over bit depth. Every sample has the same number of bits, and that number is referred to as bit depth. Ever seen how binary counting works? Each sample is saved as a value, represented by bits. Each bit is either a 1 or a 0, so to get to higher numbers we need more bits. Think of each bit as a “step” on a staircase, and you’re trying to measure a curve to the nearest step. With smaller steps, we can get more accurate measurements from the curve. With 8 bits, we can count all the way up to 255. 00000000 is 0, 00000001 is 1, 00000010 is 2, 00000011 is 3, etc etc… So if we have a bit depth of 8 bits, we have 255 potential steps that we can record the sample at. But that means we’re rounding each sample to the nearest bit. Higher bit depth allows us to record more accurate samples, (because our steps are smaller, and thus more accurate,) but also increases file size as each sample is now larger. Another way to think about it is that adding bit depth is like adding more lines on a ruler. A basic ruler may only have inches marked, a better ruler has half inches, and a fantastic ruler may even mark every 1/64 inches. Higher bit depth gives you a better ruler, which means you can take more accurate measurements.
Then we use the theorem to reconstruct those samples into the analog audio wave, and send it to our speakers.
I had to copy and paste your reply to SpeechCentral for me to listen thru hehe. Thanks for that, 8mins of good stuff :)
I actually just edited it, so you may want to redo that lol
Imma ask the Voyager guy if we can get a full text copy option
Sound is just vibrations in the air. We have devices that can wobble when they are hit by these vibrations. This can be plugged into an electric circuit to adjust it’s power which the computer can read as a number.
Then later, the computer runs everything in reverse. Those numbers are converted back into electrical power. This causes the wobbly bit in the speaker to vibrate in the same way which produces that same sound.
Since at the end of the day it’s all numbers, an MP3 file can just be converted to those numbers and back again. Although those numbers won’t mean anything to people reading the file. (This is true for all computer files - you can always convert them to and from sequences if numbers).
This causes the wobbly bit in the speaker to vibrate in the same way which produces that same sound.
Two of the things I wanted (but still find a bit overpriced and not as good sounding) are headphones that wobble your temples\skull and speakers using vibrations of any surface you attach them too. They are simple as a concept but still look like magic.
Bone conduction headphones and contact speakers?
Yes. I couldn’t remeber their exact name.
I’ve wanted a pair of bone conduction headphones for a while but I haven’t experienced them yet. I’m hoping to come across a second-hand pair to see if they’re truly worth the investment.
They still seem pretty basic if you are concerned about the quality, but for a runner or a podcast lover they are great for they don’t block your ears while you are walking or training, crossing the road, and podcasts and radio stations are usually not that demanding at all. It’s great for these usecases.
Sounds like I need a pair then. Thanks
You can express any sound as a combination of frequencies and amplitude (tones and volumes) you can define those as long chains of 1s and 0s which you can write software to convert back into frequencies and amplitudes to make speakers go brr.
You should watch Chris Monty’s Digital Audio Primer https://www.youtube.com/watch?v=FG9jemV1T7I and part 2 https://www.youtube.com/watch?v=cIQ9IXSUzuM
His skill at explaining engineering concepts for a general audience is truly amazing.
Thanks for link!
To me, honestly, the invention of vinyl records is like one of the most incredible, lifechanging things, up their with the telephone and camera and all that
That you could bring a whole orchestra into your parlor and have them play any time off record is insane
This would be better suited for one of these other communities
https://youtu.be/3DdUvoc7tJ4?si=Ntd1VTBuCS-RFPuJ
This does a pretty decent job explaining how it all works.
It’s all waves, man.
I suddenly have a hankering for 70’s show