AI diary: Call transcription and summary feature is a work in progress, but has huge potential

3 months ago 29
Apple Intelligence call transcription and summary | In-call UI shown

After first trying out the new Siri capabilities, and exploring the new Writing Tools, next in line for me on the Apple Intelligence front was the new call transcription and summary feature.

This was a feature I’d been keen to try, not least because it could completely transform the experience of interviewing someone by phone …

Recording a call

You can see the UI flow in the main image above. When you make or receive a call, there’s a new button top-left on the screen. Tap this, and an alert displays that all parties will be informed that the call is being recorded.

After a three-second countdown, a voice announcement is made:

This call will be recorded

This is a legal requirement in some US states and many countries around the world.

As recording starts, a banner appears, inviting you to take notes on the call. After that, the banner disappears and you’re just left with a waveform and a button to end recording.

The recording process really couldn’t be easier.

I believe the intention here is that audio call recording will be a system-wide feature, meaning it will work in third-party apps too, but that isn’t yet the case.

Transcription

Once the call is complete, and whether or not you accept the option to take notes, a new note opens with the audio recording embedded into it.

You can then transcribe this, which for a five-minute call took just seconds.

You can also play the recording, and get Apple Music-style time-synced highlighting of the transcription. Or you can do it the other way around: tap any part of the transcription and it will play that part of the recording.

As you might already be able to guess from the above sample, the current transcription performance is … uh … not good.

Greg’s “Yeah” was turned into “Clear straight,” and my question “What’s your normal policy on betas?” was somehow creatively reinterpreted as “What’s your normal Palestine beat?”

Things didn’t get any better from there. There were a lot of very odd substitutions, and line-breaks were rather random. For example:

Greg Gladwell
Thinking I suppose because 

Greg Gladwell
It is 

Greg Gladwell
One of the coolest things for awhile and will under undeniably be very , very use 

Greg Gladwell
Indicted today life [a mangling of “in day-to-day life”] 

Greg Gladwell

There’s gonna be you know just be able to summarize things and call or emails rather than enter 

At this point it just lost half a sentence.

Something you can also see above is random formatting, like that space before the comma.

This is a first beta of a beta feature, and I have to say it looks like it!

Summaries

As soon as the transcript is complete, you can also tap on it to be offered a summary. Here’s what it produced for our conversation about the Apple Intelligence beta:

The “Palestine beat” part aside, it’s not terrible, just very, very generic. I’m not sure how useful it would be for most people to have such a general summary, though I guess if you’re a lawyer or someone else with hundreds or thousands of transcriptions, then perhaps indexing these would help you find the right one.

Mostly, then, I’m excited for the future

This is a very convenient way to record calls, so I’ll use it on the rare occasions I need to do so, but the current transcription capabilities are not really at the point of being useful.

But I am very excited about the potential for this once it works well. For example, I wrote a while back about how a MacWhisper transcription saved the day when I had an unusable audio track for a video, but hadn’t initially realized this – which made it far harder to sync with my backup recording.

Running the audio file though MacWhisper meant that, just 90 seconds later, I had a complete, time-stamped transcript. I could then search for a phrase used in the edit, and immediately jump to that part of the audio file to substitute it for the original. A few frame-level nudges saw the video and audio properly lip-synced. The whole process took just a few minutes. 

I can absolutely see me using an iPhone as an additional audio recording device during interviews, making it really easy to find quotes and listen again to them.

For telephone interviews in particular, the sheer convenience of immediately having a time-synced transcription will be fantastic.

So … not usable yet, but given the performance of other transcription tools out there, I suspect it won’t take too long until it is.

FTC: We use income earning auto affiliate links. More.

Read Entire Article