Microsoft Speech SDK 5.1


Release notes

7/31/2001

Introduction

Welcome to the Microsoft® Speech SDK 5.1 (Speech SDK 5.1). This file describes system requirements, installation notes, and known issues. This SDK provides the tools, information, and samples you need to incorporate speech technologies into your Windows® applications.

Before installing Speech SDK 5.1, read through this document to become familiar with installation and performance issues. This file accompanies Speech SDK 5.1 and is released under the License Agreement on the license.chm file on the CD or install point.

The following topics are available:

System Requirements

Operating Systems

Supported operating systems are:

Software Requirements

Hardware Requirements

Component Minimum RAM Recommended RAM
Text-to-speech (TTS) Engine 14.5 MB 32 MB
SR Command and control 16 MB 32 MB
SR Dictation 25.5 MB 128 MB
SR Both 26.5 MB 128 MB
File Name Approximate File Size Setup Merge Names
Sapi.dll and Sapisvr.exe 0.5 MB Sp5.msm
Sapi.cpl 36 KB Sp5Intl.msm
SR Engine 1.7 MB Sp5Sr.msm
Dication, and command and control data files 13.4 MB Sp5CCInt.msm
TTS Engine and voices 7.8 MB Sp5TTInt.msm
Files common to both Microsoft SAPI 5.1 TTS and SR. 92 KB SpCommon.Msm
Language-specific SAPI 5.1 inverse text normalization (ITN) components. 108 KB Sp5itn.Msm

Installation Notes

You must have administrator privileges on the computer to install the Speech SDK 5.1 properly.

SAPI and the Speech SDK 5.1 are installed by Windows Installer. If Windows Installer has not previously been used on the computer, it may require a reboot before beginning the SDK installation process. On some versions of Windows, the SDK installation process will not automatically resume after this reboot, and the user must run setup again.

SAPI 4.0

SAPI 5.1 can coexist on your computer with SAPI 4.0. However, applications using different versions may not be compatible and should not be run simultaneously. Usually, contention for system resources will prevent this from happening.

SAPI 5.0

Because SAPI 5.1 is a superset of SAPI 5.0, the two versions can coexist on the same machine. But if both SAPI 5.0 and SAPI 5.1 are installed on the same machine, uninstalling either version could damage the other installation and require it to be reinstalled. For this reason, we recommend that you uninstall SAPI 5.0 before installing SAPI 5.1.

When English Office XP and SAPI 5.1 reside on a computer with a non-English version of Windows, removing SAPI 5 or an application which removes SAPI 5 could cause Office Speech to fail. If this occurs, use Office's "Detect and Repair" program.

None of the SAPI 5.1 components or compliance tests were tested with power-managed (OnNow) computers. As long as the system determines that there is application activity, it will not put the system or any devices into the sleeping state. However, if you encounter unexpected performance issues while using power management, OnNow should be disabled.

Occasionally, it can be difficult to uninstall a previous release of the Microsoft Speech SDK 5.0. Subsequently, install the Speech SDK 5.1. Here are two options:

(i) Run the application Regedit.exe. Delete all entries under HKEY_CURRENT_USER\Software\Microsoft\Speech\RecoProfiles\Tokens. Deleting the contents of this registry key removes the speech recognition profiles. Next, install the Speech SDK 5.1.

(ii) If your problem continues, delete the HKEY_CURRENT_USER\Software\Microsoft\Speech and the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech keys. Then try installing the Speech SDK 5.1 .

64-bit

The Speech SDK 5.1 will install on Windows XP 64-bit edition, but it is not supported. Speech recognition will not function, the sample applications will not run and the TTS engines will not be listed in Control Panel. The existing Speech functionality in Windows XP 64-bit edition will continue to function The SDK is installed to the Program Files (x86) folder and all its registry entries and it is placed in a different location (i.e., they are in a WOW folder). The TTS voices in the SDK are not added to Speech properties in Control Panel. Installing the SDK does not have any effect on the existing TTS support in Win64.

Known Issues

There may be additional situations or conditions where SAPI 5 performs differently than you expect. Please refer to this list of known issues first. If anomalies persist, you are encouraged to contact [email protected].

Language Issues

The Speech SDK 5.1 installs the English SR and TTS engines automatically. The Speech SDK 5.1 Language Pack, included on the SDK CD, installs the Japanese and/or Simplified Chinese engines. These are general guidelines for using the Japanese and/or Simplified Chinese engines:

Note that, unless your computer already has OS language support, you will need to install both OS language support and the Speech SDK 5.1 Language Pack in order to use non-English engines.

After all necessary language support has been installed, it may be necessary to change the computer's system locale in order to set Japanese or Chinese as its language.

Failure to install language support, or failure to adjust the system locale may result in one or more of the following problems:

Other language-related issues:

Speech Recognition Issues

The sample SR Engine shipped with SAPI 5.1 does not set RequiredConfidence or ActualConfidence levels.

Dictation should recognize words in user lexicons and application lexicons. Currently it recognizes words only in user lexicons.

When you use "<DEFINE>" tags with an alphanumeric value in an XML grammar, the grammar compiler will recommend that you use an attribute called "VALSTR." Disregard this recommendation, since alphanumeric constants are not currently supported, and use the "VAL" attribute to define numeric constants.

In XML grammars, evaluation of data inside a "VAL" attribute is inconsistent. If the attribute contains a numeric value, the value is rounded. If the attribute contains a named constant, the value of the constant is not rounded.

In XML grammars, SAPI will default the string portion of a semantic property (pszValue) to the first unambiguous portion of the recognized string (see Grammar Documentation: Property Pushing). To determine the complete text (including ambiguous portion), use the starting phrase element and length (ulFirstElement and ulCountOfElements).

The ISpCFGGrammar (and ISpeechRecoGrammar) object cannot import rules from an XML grammar file which was opened as dynamic.

Roaming profiles sometimes yield less optimal recognitions on different systems. You may need to perform additional training on each system you use if the recognition quality is unacceptable.

Text-to-Speech Issues

If ISpVoice::Speak (or SpVoice.Speak) is called with the VoicePurge option when voice input streams are enqueued, an extra EndStream event is raised. There is no StartStream event corresponding to this EndStream event.

Windows XP has an upgraded Remote Desktop Protocol (RDP) that supports the redirection of audio output to the client machine. The operating system will automatically change the audio output device to "RDP Connection" instead of the standard sound card when a Terminal Services client connects to it. However, the OS does not currently differentiate between legacy Terminal Services clients that do not support audio output-redirection and new clients that do support it. Therefore, pre-Windows XP Terminal Services clients that use Speech properties in Control Panel will see "RDP Connection" listed as the output device, but TTS will not work.

Audio Issues

SAPI 5.1 has been tested on a wide range of audio equipment, but it is possible that some sound cards will hang during an attempt to install SAPI. If this happens, you must use a different computer or install a different sound card.

SDK Sample Issues

The C# samples were written and compiled on a pre-release version of VisualStudio.NET. Minor changes may be necessary in order for these samples to work under the final version of VisualStudio.NET.

The Mkvoice application is currently ANSI; to use it for non-English TTS, compile it as Unicode.

If you modify SR compliance tests, use the newly-compiled version of srcomp.dll and then copy srcomp.dll to the Microsoft Speech SDK 5.1\tools\comp\bin folder.

A speech application using the InProc engine will fail to load if Speech properties in Control Panel is open, as the latter uses the shared engine. Exit all sample applications to start Speech properties in Control Panel.

From the command line, Gramcomp.exe cannot open files that contain spaces in the name. Rename the file so that it does not contain spaces.

Miscellaneous Issues

The ISpeechFileStream Read method operates on text streams differently than on audio streams. If you Open a file with the SPFMCreateAlways option, Write text data to the file, and Seek to zero, you can Read the data back. If you Open a file with the SPFMCreateAlways option, Speak audio data to the file, and Seek to zero, an attempt to Read will fail.

SR compliance tests use the LoadStringW() function that depends on Unicode data. Because Windows 98 and Windows Me do not support Unicode, these tests will neither compile nor run with these platforms.

Many grammar operations are asynchronous for efficiency and result in the inability of the application to detect errors unless the engine is in the stopped or paused state. Hence, if the application needs to test for errors in grammar loading operations and/or setting a CFG or dictation rule state, the application should pause the engine first, perform the operation, and then unpause the engine. This is recommended mainly for debugging a speech application.

(c) 2001 Microsoft Corporation. All rights reserved.