The Name & Address Application:
Natural Speech Technology, Speech Recognition, and Word Recognition
It is flattering to hear the oohs and aahs from callers in praise of ConServIT’s speech recognition. But the bad news is that when “speech recognition” arises in a conversation as a generic term, people instantly conclude that it means the stuff everyone finds so annoying, and treat me as the master of a monster gone awry. For many, “speech recognition” has only a single, less-than-desirable reputation.
But since the principal subsets of “speech recognition” can be as easily distinguished as the “baby” and “bathwater”, people can effortlessly differentiate the good technology from that found so annoying.
(a) Word Recognition – recognition of a spoken word and its conversion into a digital expression in real time. “What is your account number?” The technology will convert, in real time, a spoken “1,2,3,4” into a digital string representing the four numbers. This technology, furnished by vendors such as Nuance, SpeechWorks and a few others, is commonly embedded in interactive voice response systems (IVR).
(b) Natural Speech Technology (NST) – recognition of the substance or gist of a naturally spoken expression and subsequent conversion into a digital expression. After asking a caller “And your address?” the technology recognizes in real time whether the address is complete, whether the street name was spelled, and whether the city and state were included. Essentially, the technology acts on the thought of the respondent, regardless of the words used. This newer technology is the proprietary and patented underpinning of ConServIT’s services, trademarked Thought Recognition®.
Compare: Is the call answered as an operator would answer?
Natural Speech Technology:
Hello, XY Company. Are you calling to place an order, or how can I help you?
Word Recognition:
Hello, XY Company. Please tell me what you are calling about. Say service, sales, technical support, marketing, or …....”.
NST, following the same dialog an operator would use greeting a caller, will maintain the dialog regardless of the specific words used by the caller in response. Word Recognition, on the other hand, necessitates the use of a specific word by the caller that the system has been preprogrammed to recognize. It forces the caller into a mechanistic mode.
Compare: Is the conversation structured or natural? Are questions closed or open ended?
With NST, a caller is serviced conversationally (natural speech) and the response is also in the caller’s own unstructured words (natural speech). Responses are open ended with NST. With word recognition, on the other hand, the caller uses structured words preprogrammed into the system and close ended responses are generally required.
For this reason, the expression “Did you say ……..? Answer “Yes or No.” is inherent with Word Recognition, but anathema for Natural Speech Technology.
Compare: Are the results at least as good as with operators?
NST produces a high percentage of completed calls with accuracies at least comparable to using operators. With Word Recognition, on the other hand, the flow of the call is constantly interrupted with repetitive, annoying, time consuming, unnatural verification statements and instructions, such as mentioned above. As a result, a large percentage of callers give up or try to be redirected to an operator.
Compare: Is the call easy for the caller?
If the call minimizes the caller’s time, understands what caller says the first time, and does not repeat the caller’s answer, it is Natural Speech Technology (NST). If it taxes the caller’s patience, it uses Word Recognition.
Among the applications for which ConServIT’ natural speech technology is particularly well suited is the capture of names and addresses, phone numbers and email addresses. When asking for the caller’s name and the name is not spelled, we ask for the spelling. In asking for the address, if the street name was not spelled, we ask for the spelling. If the caller omitted the city, we ask (the 5 digit ZIP code often does not designate the correct town). All of this is done just as an operator would do. (You can hear an example of how ConServIT’s NST handles name and address information by visiting the www.conservit.com website and clicking on “Hear a Call”). Word recognition systems do not have these capabilities.
In the recently published Voice User Interface Design (Chapter 13) by Michael Cohen and Jennifer Balogh of Nuance, a leading supplier of word recognition technology, and James Giangola, the authors point out the unresolved difficulties using word recognition technology for strings of numbers and letters. Most of us have experienced these error prone aspects of word recognition technology. However, sets of letters and numbers are what names and addresses, telephone numbers or email addresses are all about.
Recognizing these inherent limitations of word recognition in the name and address application, some suppliers try to improve the results using CallerID or ANI to capture the caller’s phone number, coupling it with a reverse lookup from a public database to obtain the name and address of the party billed for the caller’s telephone service. The stretched presumption is that the billing name and address corresponds to the caller. However, the ANI and CallerID are frequently wrong or not available. The caller may be using a business daytime work phone, a cell phone number, a calling card, or calling from a friend’s house, resulting in both the wrong name and address. Name and address calls using this combined technology can take incredibly long and be cluttered with many inaccuracies. But with all these potential errors, the reverse lookup still doesn’t provide the email address.
ConServIT’s byword for caller success is to MAKE IT EASY FOR THE CALLER! That objective should be obvious and not just given lip service. But word recognition systems put the caller through hoops as if the caller had unlimited time and patience. Why not just ask a caller for his/her name, address, and phone number using NST? The caller knows better than any lookup table where he/she can be reached, and where he/she wants to receive merchandise, information, or be called back. Why be secretive and cutesy as word recognition systems try to be with callers?
An address, phone number or email address must be accurate. There is no room for approximation. An error of a single letter or digit can result in a disserved, disgruntled caller, rather than a satisfied customer. Managers of live call centers know the importance of accuracy, which is a ConServIT hallmark.
Word recognition is simply the wrong technology for the name and address application.
Apptera, a company specializing in name and address capture, recently published a chart comparing their word recognition system with a live call center. The information from that chart is reproduced below, with the additional right hand field for ConServIT’s NST. Two additional rows of information were added to the bottom to round out the comparison.

Click on the image above for chart detail.
Natural speech technology (NST) and word recognition each have their own attributes and limitations. Avoid the trapping of being sold that one technology is best suited for all applications or that all speech recognition systems are the same or similar. They are not!
Please call me, Peter Theis, at 1-800-994-4400 or email me at theis@conservit.com if you have any questions or would like to discuss this further.
But since the principal subsets of “speech recognition” can be as easily distinguished as the “baby” and “bathwater”, people can effortlessly differentiate the good technology from that found so annoying.
(a) Word Recognition – recognition of a spoken word and its conversion into a digital expression in real time. “What is your account number?” The technology will convert, in real time, a spoken “1,2,3,4” into a digital string representing the four numbers. This technology, furnished by vendors such as Nuance, SpeechWorks and a few others, is commonly embedded in interactive voice response systems (IVR).
(b) Natural Speech Technology (NST) – recognition of the substance or gist of a naturally spoken expression and subsequent conversion into a digital expression. After asking a caller “And your address?” the technology recognizes in real time whether the address is complete, whether the street name was spelled, and whether the city and state were included. Essentially, the technology acts on the thought of the respondent, regardless of the words used. This newer technology is the proprietary and patented underpinning of ConServIT’s services, trademarked Thought Recognition®.
Compare: Is the call answered as an operator would answer?
Natural Speech Technology:
Hello, XY Company. Are you calling to place an order, or how can I help you?
Word Recognition:
Hello, XY Company. Please tell me what you are calling about. Say service, sales, technical support, marketing, or …....”.
NST, following the same dialog an operator would use greeting a caller, will maintain the dialog regardless of the specific words used by the caller in response. Word Recognition, on the other hand, necessitates the use of a specific word by the caller that the system has been preprogrammed to recognize. It forces the caller into a mechanistic mode.
Compare: Is the conversation structured or natural? Are questions closed or open ended?
With NST, a caller is serviced conversationally (natural speech) and the response is also in the caller’s own unstructured words (natural speech). Responses are open ended with NST. With word recognition, on the other hand, the caller uses structured words preprogrammed into the system and close ended responses are generally required.
For this reason, the expression “Did you say ……..? Answer “Yes or No.” is inherent with Word Recognition, but anathema for Natural Speech Technology.
Compare: Are the results at least as good as with operators?
NST produces a high percentage of completed calls with accuracies at least comparable to using operators. With Word Recognition, on the other hand, the flow of the call is constantly interrupted with repetitive, annoying, time consuming, unnatural verification statements and instructions, such as mentioned above. As a result, a large percentage of callers give up or try to be redirected to an operator.
Compare: Is the call easy for the caller?
If the call minimizes the caller’s time, understands what caller says the first time, and does not repeat the caller’s answer, it is Natural Speech Technology (NST). If it taxes the caller’s patience, it uses Word Recognition.
Among the applications for which ConServIT’ natural speech technology is particularly well suited is the capture of names and addresses, phone numbers and email addresses. When asking for the caller’s name and the name is not spelled, we ask for the spelling. In asking for the address, if the street name was not spelled, we ask for the spelling. If the caller omitted the city, we ask (the 5 digit ZIP code often does not designate the correct town). All of this is done just as an operator would do. (You can hear an example of how ConServIT’s NST handles name and address information by visiting the www.conservit.com website and clicking on “Hear a Call”). Word recognition systems do not have these capabilities.
In the recently published Voice User Interface Design (Chapter 13) by Michael Cohen and Jennifer Balogh of Nuance, a leading supplier of word recognition technology, and James Giangola, the authors point out the unresolved difficulties using word recognition technology for strings of numbers and letters. Most of us have experienced these error prone aspects of word recognition technology. However, sets of letters and numbers are what names and addresses, telephone numbers or email addresses are all about.
Recognizing these inherent limitations of word recognition in the name and address application, some suppliers try to improve the results using CallerID or ANI to capture the caller’s phone number, coupling it with a reverse lookup from a public database to obtain the name and address of the party billed for the caller’s telephone service. The stretched presumption is that the billing name and address corresponds to the caller. However, the ANI and CallerID are frequently wrong or not available. The caller may be using a business daytime work phone, a cell phone number, a calling card, or calling from a friend’s house, resulting in both the wrong name and address. Name and address calls using this combined technology can take incredibly long and be cluttered with many inaccuracies. But with all these potential errors, the reverse lookup still doesn’t provide the email address.
ConServIT’s byword for caller success is to MAKE IT EASY FOR THE CALLER! That objective should be obvious and not just given lip service. But word recognition systems put the caller through hoops as if the caller had unlimited time and patience. Why not just ask a caller for his/her name, address, and phone number using NST? The caller knows better than any lookup table where he/she can be reached, and where he/she wants to receive merchandise, information, or be called back. Why be secretive and cutesy as word recognition systems try to be with callers?
An address, phone number or email address must be accurate. There is no room for approximation. An error of a single letter or digit can result in a disserved, disgruntled caller, rather than a satisfied customer. Managers of live call centers know the importance of accuracy, which is a ConServIT hallmark.
Word recognition is simply the wrong technology for the name and address application.
Apptera, a company specializing in name and address capture, recently published a chart comparing their word recognition system with a live call center. The information from that chart is reproduced below, with the additional right hand field for ConServIT’s NST. Two additional rows of information were added to the bottom to round out the comparison.
Click on the image above for chart detail.
Natural speech technology (NST) and word recognition each have their own attributes and limitations. Avoid the trapping of being sold that one technology is best suited for all applications or that all speech recognition systems are the same or similar. They are not!
Please call me, Peter Theis, at 1-800-994-4400 or email me at theis@conservit.com if you have any questions or would like to discuss this further.




0 Comments:
Post a Comment
<< Home