Codec compressione
In addition to encoding a data stream, a codec may also compress the data to reduce transmission bandwidth or storage space, this is where things can get tricky. Compression codecs are classified primarily into lossy codecs and lossless codecs. Lossy codecs reduce quality in order to make files smaller and maximize compression while lossless codecs provide the highest audio fidelity.
So, which is best for speech analytics processing? The best lossless codec that we have found is a PCM codec, or Pulse code modulation of voice frequencies. PCM is raw data that is not encoded or compressed, and G. For telephony, the PCM codec is G.
There are situations however, where you do not have the bandwidth for G. The next best choice for encoding your voice into data is the G. A vocoder uses both a tone generator, a white noise generator, and a filter that shapes the sound, just as the throat, tongue, and nasal cavities do. By itself, the vocoder produces intelligible speech, but to the human ear, it sounds like a robot is speaking. The remote end takes the code and vocoder settings and plays the sound.
When you take a reasonably modern cell phone and call another modern cell phone — those phone calls are using HD codecs — shown on Android with the [HD] symbol while the call is going on.
These HD codecs use 16khz instead of 8khz and the accuracy is dramatically better. In order to have a successful HD voice call however, both or nearly all in a conference need to use the same codec. If both sides are using different HD codecs either one side has to be transcoded — translated — into the same codec type or both sides have to shift to a mutually agreeable codec.
The name codec actually comes from a merging of these two concepts into a single word: enCOder and DECoder. Example video codecs include H. Although these standards are tied to the video stream, videos are often bundled with an audio stream which can have its own compression standard. These codecs should not be confused with the containers that are used to encapsulate everything.
These containers do not define how to encode and decode the video data. Instead, they store bytes from a codec in a way that compatible applications can playback the content. This can be confusing, though, as some audio codecs have the same names as file containers, such as FLAC.
The reason is that different video codecs are best in certain areas. For high quality video streaming over the Internet, H. The codec has a reputation for excellent quality, encoding speed and compression efficiency, although not as efficient as the later HEVC High Efficiency Video Coding, also known as H.
As noted, though, a more advanced video compression standard is already available in HEVC. This codec is more efficient with compression, which would allow more people to watch high quality video on slower connections. In , Google purchased On2 , giving them control of the VP8 codec.
Netflix tested these later formats versus H. The exception was at p resolution, which was either close and in some scenarios had VP9 as more efficient. As a result, despite not being as advanced, H. Note, the H. However, this is not the same codec but actually a free equivalent of the codec versus the licensed H. Like video, different audio codecs excel at different things. Given that they are lossy, these formats, in essence, delete information related to the audio in order to compress the space required.
The job of this compression is to strike the right balance, where a sufficient amount of space is saved without notably compromising the audio quality.
Now both of these audio coding methods have been around for awhile. So while MP3 has much more milage with device compatibility to this day, AAC benefits from superior compression and is a preferable method for streaming video content of the two. Not only that but a lot of delivery over mobile devices, when related to video, depends on the audio being AAC. This means the original audio data can be perfectly reconstructed from the compressed data.
Favoring compatibility, H. While neither is cutting edge, both can produce high quality content with good compression applied. In addition, video content compressed with these codecs can reach large audiences, especially over mobile devices. These techniques are utilized by codecs to intelligently reduce the size of video content.
The goal is to do so without hugely impacting video quality. That said, certain techniques are more noticeable to the end viewer than others. A common technique for compression is resizing, or reducing the resolution. This is because the higher the resolution of a video, the more information that is included in each frame. This will create fewer pixels, reducing the level of detail in the image at the benefit of decreasing the amount of information needed.
This concept has become a cornerstone to adaptive bitrate streaming. This is sometimes called macroblocking, although usually this is more pronounced than mere pixilation. In general this is a phenomena where parts of an image look blocky. This can be a combo of a low resolution image and interframe, where details in the video are actually changing but areas of the video as part of the interframe process are being kept. This is because every time you convert the file, it will lose some quality.
So, the idea here is to start at the highest possible quality to lose as little the time you are down to CD quality at How does it do it then? This means you retain the maximum uncompressed quality but shave a bit off the file size. So how do these formats make such a big saving on space? They use a variety of algorithms which take advantage of psychoacoustic phenomena, or how your brain actually perceives sound.
Adult Hearing Loss — Despite the upper limit for human hearing being around 24 kHz, adults often lose their capacity for hearing above kHz. This means you can remove anything above this frequency with little perceptible effect.
This allows you to focus on a loud sound over the top of a background of quieter sounds. You can still chat over the background noise. Mp3s remove some of the quieter sounds in the mix.
Your hearing works on a kind of latency, running around 30ms behind reality. This is like the processing time for your brain, and means that you focus on the most important sounds from the last 30ms. Doing this on an Mp3 means removing some of the much quieter sounds around a louder one. Minimum Audition Threshold — Quiet sounds are just that — very quiet. Bit Rate Management — This is where the big savings are done!
0コメント