These notes are written for investors, developers and decision makers.
Presentation
A codebook version of the VLC and VLR codecs is planned. For more information, see at the following address:
-
Codebook Version
This document describes the latest improvements we will make to this release.
Generally, the same codebook must be shared by the sender and the receiver.
The unilateral version allows the transmitter to send only local peaks and the receiver to use a personalized codebook. The receiver can at any time modify its codebook while being compatible with the transmitter.
The transmission rates are greater and depend on the number of points to be sent, but remain relatively low. The compression and the decompression are much faster.
A maximum number N of local peaks per frame being fixed, the N greatest local peaks are chosen. The first local peaks (the first 4 to 8 for example, in the order of increasing positions, and thus increasing frequencies) are sufficient to select a good frame to send, so one does not have to send all the selected local peaks.
By setting a search range at the beginning, with the use of partial local peaks, it is possible to increase the compression ratios, thus reducing the transmission rates.
Noise Reduction
The noises most frequently encountered during voice communications are towards the low frequencies. It is possible to shift the search range of the first points of the beginning to avoid as much noise as possible, while having a good reception quality. In some cases, the noise can be totally eliminated: the intact voice frame is found on the receiver side.
Too large an offset may lead to more than one starting solution, and therefore to a greater or lesser loss of timbre.
It should be noted that most noise reduction solutions modify the magnitude or the amplitude values towards the low frequencies.
It should also be noted that most current voice communications codecs decrease the magnitude or the amplitude values towards the low frequencies.
For more information on the classical noise reduction with the VLC and VLR codecs, see at the following address:
-
Noise Reduction
Special Case (Four Local Pics)
If we consider only the first four local peaks in a search area, the relative positions (differences between the positions of two peaks) over four bits, the magnitudes (coded with the aid of the logarithm) over four bits, we obtain a very interesting case.
Indeed:
- The positions fit on a short integer of 16 bits.
It should be noted that with the codebook version, in the general case, it is better to use the absolute positions so that the vectors of the positions contain only unique values.
It is better to continue to generate the databases with the absolute positions and to calculate absolute positions if we receive relative positions.
- The magnitudes also fit on a short integer of 16 bits.
- The integers of the same nature are directly comparable with each other.
For positions, it is assumed that the lowest position has the greatest weight.
- We can generate all possible cases (65536 combinations for the positions and 65536 combinations for the magnitudes).
- All the possible cases being in the base, there is no more need for GPU or kNN to look for the nearest neighbor, we have a direct access to the vectors we need.
- Compared to a given codebook, these two short integers correspond uniquely to a FFT frame. They can be taken directly as a signature.
Compared to a given codebook, in the context of the artificial intelligence or deep learning, in addition to accelerating the calculations, one can consider working directly and only with these integers instead of using vectors with several dimensions.
Other Opportunities
-
Gadgets
One can consider listening to music at the same time as speech broadcasts or voice commands, while transmitting only the voice. It is sufficient to decrease (by using an equalizer for example) the magnitudes of the musical frequencies emitted in the search range of the first local peaks.
-
Temporal derivatives
For a one-dimensional and temporal signal, in the frequency domain, the derivatives of any order are simple multiplications modifying the magnitudes and the phases.
One can therefore use the VLC and VLR codecs to compress the temporal derivatives.
One can envisage to compress the first derivatives via the bilateral or unilateral version of these codecs. The obtained features can be used in the algorithms of the artificial intelligence and deep learning.
Final Notes
- With the unilateral codebook version, only the receiver uses the codebook.
However, the other parameters must be identical, in particular the sampling rate, the size of the FFT buffers and the maximum number of local peaks per frame.
- With the unilateral codebook version, to avoid side effects on the receiver side, it is necessary to use a frame overlapping.
- The unilateral codebook version is compatible with the optimized version of the VLC and VLR codecs in transmission (use of local peaks only, no phases), but with a frame overlapping.
With the optimized version of the codecs in emission, it is possible to implement, on the receiver side, the use of the sliding partial local peaks for the selection of the best frame, in order to avoid the noise: it is sufficient to search in the base the frames having the same local peaks in the same search area.
- With the bilateral or unilateral codebook version, the frames of the local peaks can be linked to the complete original frames (which were used to generate the codebook), for an optimal voice quality, but with an approximate timbre.
If we use the original frames of the time domain, there is no longer any need to make inverse FFT (iFFT), just apply the frame overlapping.
If we use the original frames of the frequency domain, the positions, magnitudes and phases are accessed with the maximum accuracy. These values can also be stored only for the local peaks.
- Finally, with the bilateral or unilateral codebook version, each code used can be linked to a ceptrum (FFT or MFCC) calculated in advance.
Contacts and Comments:
support@whmsoft.com
Our codecs are based on FFT and can be accelerated with the GPU support
as this WebGL animation.
three.js