Speech Synthesis Manager
My honeymoon with Cocoa lasted about two weeks before I found myself having to invoke a Carbon API since the Cocoa one didn’t support the features I wanted. I wanted my application to speak some text, but I wanted to configure various speech-related properties. The NSSpeechSynthesizer class is extremely easy to use, but doesn’t provide access to any speech-related property other than the voice. So, I was off to read up on the Speech Synthesis Manager (which I remember as just the Speech Manager). While I think it would be great if NSSpeechSynthesizer exposed all of the features available in the Speech Synthesis Manager, I’m not going to rake another Carbon library over the coals; that’s already been done quite well. Instead, I’m going to share the problems I encountered and the solutions I found when integrating the Speech Synthesis Manager into my Cocoa application.
My goal was to find a way to speak multiple blocks of text, one after the other. The speaking is initiated by invoking SpeakText. This is an asynchronous call, it returns before the text has been spoken. To know when the speaking has completed, I registered a callback:
error = SetSpeechInfo( speechChannel,
soSpeechDoneCallBack, speechDoneProc );
Ah, Universal Procedure Pointers… calling PPC code from 68K code… those were the days…. Oh, wait. No they weren’t. Anyway, while I didn’t think it would work, I first tried to reuse the existing SpeechChannel object to speak the next block of text by invoking SpeakText in the callback function. It didn’t work. I wasn’t surprised since these callbacks used to be called in the context of an interrupt service routine and, therefore, what was allowed was very restricted. On Mac OS X, this isn’t the case and the callback is invoked in the context of a thread other than the main application’s thread. In fact, on Mac OS X, user-mode code is never invoked in the context of an ISR; that is left to the kernel. Regardless, I doubt the Speech Synthesis Manager was designed to be used in this way.
My next step was to find a way to delay the next invocation of SpeakText. While I could have done this using low-level synchronization primitives, I expected that there was an easy Cocoa-friendly way to do this. I remembered reading about notifications. The NSNotificationCenter wasn’t appropriate since it notifies observers synchronously and, therefore, wouldn’t be any better than directly invoking SpeakText in the callback. However, the documentation for NSNotificationCenter referenced the NSNotificationQueue class which notifies observer asynchronously. Perfect. Well, not really. It didn’t work.
I wasn’t completely surprised since I realized it was probably a stretch to think that I could interact with the same SpeechChannel object from multiple threads. So, the challenge was now to notify my controller object, in the context of the main application thread, that it should speak the next block of text. Again, rather than using a low-level synchronization primitive, I set out to find a Cocoa-friendly technique. I found a Stepwise article that explained to do exactly what I wanted using the NSPort class. NSPort is easy to use and, twenty lines later, I was listening to multiple blocks of text one after the other. Success. Time for bed.
This morning, I vaguely remembered having read something about using the Speech Synthesis Manager from Cocoa. After quick search in Yojimbo, I found a bookmark to Daniel Jalkut’s post. I already had a solution that worked, but I was curious how Daniel solved this challenge. So, I downloaded RSSafeSpeaker and took a look:
void MySpeechCompletedCallback(SpeechChannel chan,
long refCon)
{
RSSafeSpeaker* selfObject = (RSSafeSpeaker*)refCon;
// We have to dispose this on the main thread
// because our callback might have been
// called from a secondary CoreAudio thread
NSAutoreleasePool* valuePool =
[[NSAutoreleasePool alloc] init];
[selfObject performSelectorOnMainThread:
@selector(safeDisposeChannel:)
withObject:[NSValue valueWithPointer:chan]
waitUntilDone:NO];
[valuePool release];
}
Wow. Do you see that? The performSelectorOnMainThread method is exactly what I needed. I deleted the twenty lines from last night and wrote one new one. I love that clean feeling you get after refactoring.
Some additional useful links on this topic:
- Speech Synthesis Manager Reference
- Speech Synthesis Programming Guide: covers both Cocoa and Carbon speech APIs.
- Carbon-Cocoa Integration Guide: calling a Cocoa from Carbon and Carbon from Cocoa.

June 29th, 2008 at 10:29 am
Than you very much for providing me with a cheap solution to my problem! keep up the good work.