用Azure認知服務開發一個語音翻譯機,學英文很爽快

最近CSDN開展瞭《0元試用微軟 Azure人工智能認知服務,精美禮品大放送》,當前目前活動還在繼續,熱心的我已經第一時間報名參與,隻不過今天才有時間實際的試用。

目前活動要求博文形式分享試用語音轉文本、文本轉語音、語音翻譯、文本分析、文本翻譯、語言理解中三項以上的服務。

目前我在試用瞭 語音轉文本、文本轉語音、語音翻譯 功能後,決定做一個實時語音翻譯機,使用後效果是真不錯。

下面我們看看如何操作吧,首先我們進入:https://portal.azure.cn/並登錄。

獲取密鑰

在搜索框輸入 認知服務 並確認:

然後可以創建語音服務:

然後輸入名稱,選擇位置,選擇免費定價,新增資源組並選擇:

之後,點擊創建。創建過程中會顯示正在部署:

部署完成後,點擊轉到資源:

然後我們點擊密鑰和終結點,查看密鑰和位置/區域:

有兩個密鑰任選一個即可,位置/區域也需要記錄下來,後面我們的程序就需要通過密鑰和位置來調用。

Azure 認知服務初體驗

Azure 認知服務文檔:https://docs.azure.cn/zh-cn/cognitive-services/

按文檔要求,我們首先安裝Azure 語音相關的python庫:

pip install azure-cognitiveservices-speech

首先我們體驗一下語音轉文本:

測試語音轉文本

文檔:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python

復制官方的代碼後,簡單修改下實現從麥克風識別語音:

import azure.cognitiveservices.speech as speechsdkspeech_key, service_region = "59392xxxxxxxxxx559de", "chinaeast2"speech_config = speechsdk.SpeechConfig(    subscription=speech_key, region=service_region, speech_recognition_language="zh-cn")speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)print("說:", end="")result = speech_recognizer.recognize_once()print(result.text)

speech_recognition_language決定瞭語言,這裡我設置為中文。

我運行後,對麥克風說瞭一句話,程序已經準確的識別出我說的內容:

說:微軟人工智能服務非常好用。

測試文本轉語音

文檔:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python

借助文檔我們還可以實現將轉換完成的語音保存起來,但這裡我隻演示直接聲音播放出來:

from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormatfrom azure.cognitiveservices.speech.audio import AudioOutputConfigspeech_config.speech_synthesis_language = "zh-cn"audio_config = AudioOutputConfig(use_default_speaker=True)speech_synthesizer = SpeechSynthesizer(    speech_config=speech_config, audio_config=audio_config)text_words = "微軟人工智能服務非常好用。"result = speech_synthesizer.speak_text_async(text_words).get()if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:    print(result.reason)

感覺轉換效果很好。

測試語音翻譯功能

文檔地址:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-translation?tabs=script%2Cwindowsinstall&pivots=programming-language-python

經測試,語音翻譯同時包含瞭語音轉文本和翻譯功能:

from_language, to_language = 'zh-cn', 'en'translation_config = speechsdk.translation.SpeechTranslationConfig(    subscription=speech_key, region=service_region, speech_recognition_language=from_language)translation_config.add_target_language(to_language)recognizer = speechsdk.translation.TranslationRecognizer(    translation_config=translation_config)def speakAndTranslation():    result = recognizer.recognize_once()    if result.reason == speechsdk.ResultReason.TranslatedSpeech:        return result.text, result.translations[to_language]    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:        return result.text, None    elif result.reason == speechsdk.ResultReason.NoMatch:        print(result.no_match_details)    elif result.reason == speechsdk.ResultReason.Canceled:        print(result.cancellation_details)speakAndTranslation()

這裡執行後並說一句話,結果:

('大傢好才是真的好。', 'Everyone is really good.')

可以同時獲取原始文本和譯文,所以我們後面的語音翻譯工具,也都使用該接口。

語音翻譯機開發

程序的大致邏輯結構:

完整代碼:

"""小小明的代碼CSDN主頁:https://blog.csdn.net/as604049322"""__author__ = '小小明'__time__ = '2021/10/30'import azure.cognitiveservices.speech as speechsdkfrom azure.cognitiveservices.speech.audio import AudioOutputConfigspeech_key, service_region = "59xxxxde", "chinaeast2"speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,                                       speech_recognition_language="zh-cn")speech_config.speech_synthesis_language = "zh-cn"audio_config = AudioOutputConfig(use_default_speaker=True)speech_synthesizer = speechsdk.SpeechSynthesizer(    speech_config=speech_config, audio_config=audio_config)from_language, to_language = 'zh-cn', 'en'translation_config = speechsdk.translation.SpeechTranslationConfig(    subscription=speech_key, region=service_region, speech_recognition_language=from_language)translation_config.add_target_language(to_language)recognizer = speechsdk.translation.TranslationRecognizer(    translation_config=translation_config)def speakAndTranslation():    result = recognizer.recognize_once()    if result.reason == speechsdk.ResultReason.TranslatedSpeech:        return result.text, result.translations[to_language]    elif result.reason == speechsdk.ResultReason.RecognizedSpeech:        return result.text, None    elif result.reason == speechsdk.ResultReason.NoMatch:        print(result.no_match_details)    elif result.reason == speechsdk.ResultReason.Canceled:        print(result.cancellation_details)def speak(text_words):    result = speech_synthesizer.speak_text_async(text_words).get()    #     print(result.reason)    if result.reason == speechsdk.ResultReason.Canceled:        cancellation_details = result.cancellation_details        print("識別取消:", cancellation_details.reason)        if cancellation_details.reason == speechsdk.CancellationReason.Error:            if cancellation_details.error_details:                print("錯誤詳情:", cancellation_details.error_details)while True:    print("說:", end=" ")    text, translation_text = speakAndTranslation()    print(text)    print("譯文:", translation_text)    if "退出" in text:        break    if text:        speak(translation_text)

簡單的運行瞭一下,中間的打印效果如下:

說: 我隻想進轉過山和大海。譯文: I just want to go in and out of the mountains and the sea.說: 也穿越,人山人海。譯文: Also through, the sea of people and mountains.說: 我曾經目睹這一切全部都隨風飄然。譯文: I've seen it all blow in the wind.說: 轉眼成空。譯文: It's empty.說: 問,世間能有幾多愁?譯文: Q, how much worry can there be in the world?說: 退出。譯文: quit.

最終的語音功能也隻有各位親自體驗瞭噢。