离线语音 Snowboy 热词唤醒 + 树莓派语音交互实现开关灯
离线语音Snowboy热词唤醒+ 树莓派语音交互实现开关灯 – osc_tdheup3x的个人空间 – OSCHINA
#离线语音 Snowboy 热词唤醒
语音识别现在有非常广泛的应用场景,如手机的语音助手,智能音响 (小爱,叮咚,天猫精灵…) 等。语音识别一般包含三个阶段:热词唤醒,语音录入,识别和逻辑控制阶段.
热词唤醒就是唤醒设备,让设备解析你接下来说的话。通常设备一直在录入周围的声音,但是设备此时不会有任何反应。当通过像「Hi,Siri」这样的唤醒词被唤醒以后,设备就开始处理接下来的声音了。热词唤醒是语音识别的开始。
Snowboy 是比较流行的热词唤醒框架,目前已经被百度收购。Snowboy 对中文支持友好,相对 Pocketsphinx 配置使用较为简单,推荐使用。
snowboy 官方文档地址 [英文的] http://docs.kitt.ai/snowboy
安装
一、获取源代码并编译
安装依赖
树莓派原生的音频设备是不支持语音输入的(无法录音),需要在网上购买一支免驱动的 USB 音频驱动,一般插上即可直接使用。 建议安装下 pulseaudio
软件,减少音频配置的步骤: $ sudo apt-get install pulseaudio
安装 sox
软件测试录音与播放功能: $ sudo apt-get install sox
安装完成后运行 sox -d -d
命令,对着麦克风说话,确认可以听到自己的声音。
安装其他软件依赖:
- 安装 PyAudio:
$ sudo apt-get install python3-pyaudio
- 安装 SWIG(>3.0.10):
$ sudo apt-get install swig
- 安装 ATLS:
$ sudo apt-get install libatls-base-dev
编译源代码
获取源代码:$ git clone https://github.com/Kitt-AI/snowboy.git
编译 Python3 绑定:$ cd snowboy/swig/Python3 && make
测试:
如果使用的是树莓派,你还需要在 ~/.asoundrc
更改声卡设置:
1 2 3 4 5 6 7 8 9 10 11 |
<span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">asym</span></span> playback.pcm { <span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">plug</span></span> slave.pcm <span class="hljs-string">"hw:0,0"</span> } capture.pcm { <span class="hljs-class"><span class="hljs-keyword">type</span> <span class="hljs-title">plug</span></span> slave.pcm <span class="hljs-string">"hw:1,0"</span> } } |
进入官方示例目录 snowboy/examples/Python3
并运行以下命令: $ python3 demo.py resources/models/snowboy.umdl
( 命令中的 snowboy.umdl
文件即语音识别模型)
然后对着麦克风清晰地讲出 **“snowboy”**,如果可以听到 “滴” 的声音,则安装配置成功。
PS:官方源代码使用 Python3 测试有报错,经测试需修改 snowboy/examples/Python3
目录下的 snowboydecoder.py
文件。 将第 5 行代码 from * import snowboydetect
改为 import snowboydetect
即可直接运行。
快速开始
GitHub 上有比较详细的 Demo,强烈建议先看看。先创建一个 HotwordDetect 类,这个类包含唤醒模型,声音增益,灵敏度等参数。然后初始化 Detector 对象,Snowboy 的 Detector 类存在下载下来的源码里。训练模型可以是单个,也可以是列表形式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<span class="hljs-keyword">from</span> .. <span class="hljs-keyword">import</span> snowboydetect <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HotwordDetect</span><span class="hljs-params">(object)</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, decoder_model, resource, sensitivity=<span class="hljs-number">0.38</span>, audio_gain=<span class="hljs-number">1</span>)</span>:</span> <span class="hljs-string">"""init"""</span> self.detector = snowboydetect.SnowboyDetect( resource_filename=resource.encode(), model_str=decoder_model.encode()) self.detector.SetAudioGain(audio_gain) |
初始化以后可以创建启动方法,启动方法一般会指定一个唤醒回调函数,也就是 「Hi,Siri」之后可能出现的「叮」声;还可以指定录音回调函数,也就是设备唤醒以后你需要用这些声音去干什么:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">HotwordDetect</span><span class="hljs-params">(object)</span>:</span> ... <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">listen</span><span class="hljs-params">(self, detected_callback, interrupt_check=lambda: False, audio_recorder_callback)</span>:</span> <span class="hljs-string">"""begin to listen"""</span> ... state = <span class="hljs-string">"PASSIVE"</span> <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>: status = self.detector.RunDetection(data) ... <span class="hljs-keyword">if</span> state == <span class="hljs-string">"PASSIVE"</span>: tetected_callback() state = <span class="hljs-string">"ACTIVE"</span> <span class="hljs-keyword">continue</span> <span class="hljs-keyword">elif</span> state == <span class="hljs-string">"ACTIVE"</span>: audio_recorder_callback() state = <span class="hljs-string">"ACTIVE"</span> <span class="hljs-keyword">continue</span> |
这里的逻辑可以自己去定义,主要是在两个状态间切换,当设备接收到唤醒词以后,status 会指出被识别到的唤醒词的序号,比如你定义了 「Siri」和 「Xiaowei」两个唤醒词,status 为 1 就表示 Siri 被唤醒,status 为 2 就表示 Xiaowei 被唤醒。然后将状态改成激活状态,这个时候执行 audio_recorder_callback 方法,执行完后将状态切换回唤醒状态。
在线语音识别
当设备被唤醒以后,你可以拿到录音数据去做任何想做的事情,包括调取百度等语音识别接口。这些逻辑都包含在 audio_recorder_callback 回调方法中。需要注意的是 Snowboy 目前只支持 16000 的录音采样率,其他采样率的录音数据都不能使用,你可以通过两种办法来解决:
- 使用支持 16000 采样率的声卡
- 进行录音数据的采样率转换
目前比较大的两家声卡芯片公司 C-Media 和 RealTek 一般产品都是 48k 以上的,支持 16k 的芯片一般比较贵,可能到 60 元左右。「绿联」有两款产品可以支持,购买时请查看产品参数,对照芯片公司的产品型号是否支持 16k 采样。
声音模型的训练
官方提供两种模式进行个性化声音模型创建:
- website。只要你有 GitHub,Google 和 Facebook 帐号中的一种,登录就可以录音完成训练。
- train-api。根据文档传指定的参数就可以完成训练,api 返回给你升学模型的数据。
这两种方式获得的都是私人的声音模型,获取的是 .pmdl
的文件形式。一般化的 universal 模型不提供,需要联系官方商业合作。获取到的模型,越多人测试准确率越高,为了提高准确率,你可以邀请更多人来测试你的模型。还有麦克风的种类也会影响准确度,在什么设备上使用就在那个设备上训练模型能提高准确率。语音识别是一个比较精尖的技术,需要注意很多问题,正如 ChenGuo 说的:
Speech Recognition is not that easy.
在自己的项目中使用
将以下文件复制到自己的项目目录下:
- 下载好的 model.pmdl 模型文件
snowboy/swig/Python3
目录下编译好的_snowboydetect.so
库snowboy/examples/Python3
目录下的demo.py
、snowboydecoder.py
、snowboydetect.py
文件以及resources
目录- 在项目目录下执行
$ python3 demo.py model.pmdl
并使用自己的唤醒词进行测试
orangePi 下使用语音识别来实现语音开关灯,需要联网使用.
gpio.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<span class="hljs-string">""" @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: gpio.py @time: 2018/3/13 18:45 """</span> <span class="hljs-keyword">import</span> wiringpi <span class="hljs-keyword">as</span> wp <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">GPIO</span><span class="hljs-params">()</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span> self.wp=wp wp.wiringPiSetupGpio() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setPinMode</span><span class="hljs-params">(self,pin,mode)</span>:</span> self.wp.pinMode(pin,mode) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setV</span><span class="hljs-params">(self,pin,v)</span>:</span> self.wp.digitalWrite(pin,v) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">getV</span><span class="hljs-params">(self,pin)</span>:</span> <span class="hljs-keyword">return</span> self.wp.digitalRead(pin) |
之前案例修改了以下. control.py
|
<span class="hljs-string">""" @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: control.py @time: 2018/3/13 17:30 """</span> <span class="hljs-keyword">import</span> os <span class="hljs-keyword">import</span> sys sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) <span class="hljs-keyword">import</span> time <span class="hljs-keyword">import</span> pyaudio <span class="hljs-keyword">import</span> wave <span class="hljs-keyword">import</span> pygame <span class="hljs-keyword">import</span> snowboydecoder <span class="hljs-keyword">import</span> signal <span class="hljs-keyword">from</span> gpio <span class="hljs-keyword">import</span> GPIO <span class="hljs-keyword">from</span> aip <span class="hljs-keyword">import</span> AipSpeech APP_ID = <span class="hljs-string">'109472xxx'</span> API_KEY = <span class="hljs-string">'d3zd5wuaMrL21IusNqdQxxxx'</span> SECRET_KEY = <span class="hljs-string">'84e98541331eb1736ad80457b4faxxxx'</span> APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY) interrupted = <span class="hljs-literal">False</span> CHUNK = <span class="hljs-number">1024</span> FORMAT = pyaudio.paInt16 CHANNELS = <span class="hljs-number">1</span> RATE = <span class="hljs-number">16000</span> RECORD_SECONDS = <span class="hljs-number">5</span> WAVE_OUTPUT_FILENAME = <span class="hljs-string">"./myvoice.pcm"</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Light</span><span class="hljs-params">()</span>:</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self)</span>:</span> self.pin=<span class="hljs-number">18</span> self.mode=<span class="hljs-number">1</span> self.mgpio=GPIO() self.mgpio.setPinMode(pin=self.pin,mode=<span class="hljs-number">1</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">on</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-string">''</span> self.mgpio.setV(self.pin,self.mode) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">off</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-string">''</span> self.mgpio.setV(self.pin,self.mode&<span class="hljs-number">0</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">status</span><span class="hljs-params">(self)</span>:</span> <span class="hljs-keyword">return</span> self.mgpio.getV(self.pin) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_file_content</span><span class="hljs-params">(filePath)</span>:</span> <span class="hljs-keyword">with</span> open(filePath, <span class="hljs-string">'rb'</span>) <span class="hljs-keyword">as</span> fp: <span class="hljs-keyword">return</span> fp.read() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">word_to_voice</span><span class="hljs-params">(text)</span>:</span> result = APIClient.synthesis(text, <span class="hljs-string">'zh'</span>, <span class="hljs-number">1</span>, { <span class="hljs-string">'vol'</span>: <span class="hljs-number">5</span>, <span class="hljs-string">'spd'</span>: <span class="hljs-number">3</span>, <span class="hljs-string">'per'</span>: <span class="hljs-number">3</span>}) <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> isinstance(result, dict): <span class="hljs-keyword">with</span> open(<span class="hljs-string">'./audio.mp3'</span>, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> f: f.write(result) f.close() time.sleep(<span class="hljs-number">.2</span>) pygame.mixer.music.load(<span class="hljs-string">'./audio.mp3'</span>) pygame.mixer.music.play() <span class="hljs-keyword">while</span> pygame.mixer.music.get_busy() == <span class="hljs-literal">True</span>: print(<span class="hljs-string">'waiting'</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_mic_voice_file</span><span class="hljs-params">(p)</span>:</span> word_to_voice(<span class="hljs-string">'请说开灯或关灯.'</span>) stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=<span class="hljs-literal">True</span>, frames_per_buffer=CHUNK) print(<span class="hljs-string">"* recording"</span>) frames = [] <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print(<span class="hljs-string">"* done recording"</span>) stream.stop_stream() stream.close() wf = wave.open(WAVE_OUTPUT_FILENAME, <span class="hljs-string">'wb'</span>) wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(<span class="hljs-string">b''</span>.join(frames)) wf.close() print(<span class="hljs-string">'recording finished'</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">baidu_get_words</span><span class="hljs-params">(client)</span>:</span> results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), <span class="hljs-string">'pcm'</span>, <span class="hljs-number">16000</span>, { <span class="hljs-string">'dev_pid'</span>: <span class="hljs-number">1536</span>, }) words=results[<span class="hljs-string">'result'</span>][<span class="hljs-number">0</span>] <span class="hljs-keyword">return</span> words <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">signal_handler</span><span class="hljs-params">(signal, frame)</span>:</span> <span class="hljs-keyword">global</span> interrupted interrupted = <span class="hljs-literal">True</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">interrupt_callback</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> interrupted <span class="hljs-keyword">return</span> interrupted <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">callbacks</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> detector pygame.mixer.music.load(<span class="hljs-string">'./resources/ding.wav'</span>) pygame.mixer.music.play() <span class="hljs-keyword">while</span> pygame.mixer.music.get_busy() == <span class="hljs-literal">True</span>: print(<span class="hljs-string">'waiting'</span>) detector.terminate() get_mic_voice_file(p) rText=baidu_get_words(client=APIClient) <span class="hljs-keyword">if</span> rText.find(<span class="hljs-string">"开灯"</span>)!=<span class="hljs-number">-1</span>: light.on() <span class="hljs-keyword">elif</span> rText.find(<span class="hljs-string">"关灯"</span>)!=<span class="hljs-number">-1</span>: light.off() wake_up() <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">wake_up</span><span class="hljs-params">()</span>:</span> <span class="hljs-keyword">global</span> detector model = <span class="hljs-string">'./resources/models/snowboy.umdl'</span> signal.signal(signal.SIGINT, signal_handler) detector = snowboydecoder.HotwordDetector(model, sensitivity=<span class="hljs-number">0.5</span>) print(<span class="hljs-string">'Listening... please say wake-up word:SnowBoy'</span>) detector.start(detected_callback=callbacks, interrupt_check=interrupt_callback, sleep_time=<span class="hljs-number">0.03</span>) detector.terminate() <span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>: pygame.mixer.init() p = pyaudio.PyAudio() light=Light() wake_up() |
相关参考文档:
http://docs.kitt.ai/snowboy/#api-v1-train