Inventors:
- Redmond WA, US
Guoli YE - Redmond WA, US
Rui ZHAO - Bellevue WA, US
Yifan GONG - Sammamish WA, US
Ke LI - Baltimore MD, US
International Classification:
G10L 15/00
G10L 15/04
G10L 15/06
Abstract:
A CS CTC model may be initialed from a major language CTC model by keeping network hidden weights and replacing output tokens with a union of major and secondary language output tokens. The initialized model may be trained by updating parameters with training data from both languages, and a LID model may also be trained with the data. During a decoding process for each of a series of audio frames, if silence dominates a current frame then a silence output token may be emitted. If silence does not dominate the frame, then a major language output token posterior vector from the CS CTC model may be multiplied with the LID major language probability to create a probability vector from the major language. A similar step is performed for the secondary language, and the system may emit an output token associated with the highest probability across all tokens from both languages.