Analysing a Speech in VR using Speech-to-Text and Motion Tracking Technology

December 1, 2017 - Dom Barnard

It can be difficult to get feedback on a speech and even harder to quantify your performance. In this article, we use the VirtualSpeech app to review a speech on a range of specific criteria, from eye contact to pace of voice. The speech was just over 6 minutes long, giving the app a good amount of data (over 800 words) to work with and provide feedback to the user.

The topic of the speech was on how automotive companies can use virtual reality to reduce prototyping costs. The actual topic is not important, as the speech analysis can be applied to any speech or presentation for real time feedback.

Meeting room with speech analysis on users speech and presentation.

The virtual meeting room the user gave their speech in. The users presentation was loaded onto the left wall, where the Welcome placeholder image is. The user pressed the 'Start Analyse' button to begin the speech analysis.

The virtual reality app served two purposes:

  1. Immerse the user in a realistic meeting room
  2. Provide real time feedback to the user

The first point is important as it goes some way to recreating the fear and excitement you might experience when presenting in front of a real audience. In virtual reality, we can simulate lighting distractions, mobile phones going off, audience members talking to each other and a wide range of other scenarios which you wouldn’t get without practicing in VR.

The second point is covered in depth in this article. The app gives feedback on these areas of the users speech:

  • Pace of voice (how quickly the user is speaking)
  • Number of hesitation words
  • Volume (loudness) of the users voice
  • Eye contact performance
  • Speech insights (not used in this speech)
  • Speech concepts (not used in this speech)

The users performance is reviewed by the app according to the criteria above. We also discuss other useful features available to the user, such as saving the speech to listen back to later, and uploading the speech to the VirtualSpeech team for detailed feedback.

Virtual speech overview

  • Speech title: How automotive companies can use virtual reality to reduce prototyping costs
  • Speech length: 6 minutes
  • Virtual environment: Meeting room with 11 audience members

Uploading your own presentation slides

Before starting the speech, the user uploaded their presentation slides into the VR app. This allowed the user to present with their own slides and use them as visual cues for the speech.

The method for uploading your own slides is fairly straight forward – create a presentation using PowerPoint, Keynote or similar software and save your slides as a PDF document. You can then upload the PDF to the app by emailing them to yourself or transferring through iTunes (iPhone) or file transfer (Android).

Practice with your own presentation slides in VR with speech analysis

Screenshot showing the users presentation slides inside the meeting room.

Having your own slides in the virtual room with you helps you better prepare for an upcoming event, as you can work on getting the correct timing and pauses. To change the slides inside the app, you can either use the buttons to move to the next or previous slides, or press your VR headset trigger while looking at the slides to get them to change.

Speech analysis using speech-to-text technology

Technology background

Receiving feedback is essential for improving public speaking skills and ensuring that each time you practice, you’re becoming a presenter. In order to do this, the app uses speech-to-text and other vocal technology to analyse the speech and provide feedback for the user. If you are planning on using your own speech-to-text software, we recommend the following:

Each API offers unique benefits and issues for analysing a speech and converting it to text.

Once the app has converted the speech to a body of text, we created several algorithms to analyse this and provide meaningful and understandable results to the user. This allows the user to quantify their performance and improve areas of their speech each time.

Speech analysis - how did the user perform?

The first thing you will notice when listening back to the audio (in the Appendix below) is that the user speaks very quickly, in particular towards the end. The user averages 141 words per minute, which is a little too fast – around 120 words per minute is more preferable for a presentation.

The user had around 19 filler words, such as ‘um’ and ‘ah’, out of a total of around 860 words. This is not bad, as filler words are not always a negative in a speech (particularly in a conversation). When listening back to the audio, you’ll notice quite a few of the filler words were when the user was thinking of what to say next. A pause from saying anything (i.e. silence) would be preferable in this case.

Get feedback on your speech in VR with speech-to-text technology and motion tracking.

Eye contact analysis and process

How we calculate eye contact

For eye contact analysis, the app assumes the eyes are looking directly forward from the head. In this way, when the user moves their head to look at something, the app assumes the eyes move as the head moves. If you watch presentations, you’ll notice this mostly holds true and is a fair assumption to make.

The app records the users eye contact throughout the speech and then provides a heatmap of where the user was looking while speaking. This allows the user to easily see any areas they have neglected or focussed too much on.

The data for the heatmap is collected in two ways:

  • Head movement is extracted from the Google VR or Oculus VR software, which is used to build the VR experience.
  • The room is broken up into a grid, with hundreds of squares making up the grid. When the user looks at one of these grids, the matrix increases the value of that grid. After the speech has been completed, the weighted average of the matrix is calculated.

Combining these two sets of data gives an accurate reference of where the users was looking throughout the speech or presentation.

Eye contact analysis - how did the user perform?

The user performs well on the eye contact with the audience, scoring 8/10. The audience is more likely to engage with the user when speaking and understand the message. From the heatmap analysis, we can see that the user spends time with each audience member and spreads the eye contact evenly amongst them over time.

A point to note is that we are not able to determine how long the user maintained eye contact with audience members for in each session, just that the total eye contact was well distributed. For example, the user might have spent 45 seconds with the first audience member, then 45 with the second audience member and so on, instead of in 3-5 second periods, which is recommended for an audience of this size.

Saving and uploading the speech for detailed feedback

After the user has reviewed their performance, they have the option of saving their speech to listen back to later, or uploading the speech to the VirtualSpeech team for further, detailed analysis.

The saved speeches are located in the Progress room, found from the VR main menu. Up to 5 speeches can be saved at any time within the app.

In order to upload the speech for additional feedback, the user needs to enter their email address which the VirtualSpeech team will use to send the feedback to. The additional feedback can be used to get an insight into areas of your speech the app currently cannot, such as:

  • Where any literary techniques used?
  • What is the tone of the speech?
  • How persuasive or influential is the speech?
  • Is the user emphasising the key message?
  • Is the key message clear?

Track progress within the app

All the feedback you receive from the speech analytics feature is stored on your mobile device and displayed in a section of the app. It's really easy to measure progress and determine how you are progressing over time, including areas such as eye contact, hesitation words and pace of speaking.

Track your communication skills progress in the VirtualSpeech app.

In conclusion

The VirtualSpeech app provides a powerful way for people to analyse their speech or presentation. The feedback allows users to identify weaker areas of their speech and work to improve those parts. In addition, with the realistic environments, audience and personalisation (load in your own slides), the app takes you close to being fully immersed in the environment.

Appendix: Speech transcript and audio file

Speech audio file

Speech transcript

Hello everyone and welcome to this speech about how automotive companies can use virtual reality to reduce their prototyping costs.

So firstly let’s talk about how prototyping use to be done and is still being done in a lot of companies. They will spend hundreds of thousands of dollars building early prototypes of vehicles, just to check things such as whether pillars are blocking their views, uhm, how their surfaces look in different lighting, uhm, and these, uh, prototype vehicles will be viewed by the engineering team, the design team, and they won’t be full vehicles, they will be shells of vehicles so obviously none of the electronics inside and um, mostly the physical look so people have idea of what this stuff looks like.

More and more people are finding ways to replace these really expensive prototypes, and you might have 5, 10, 15 of these per vehicle line, so that’s tens of millions of pounds or dollars for automotive companies which can be hugely reduced with virtual reality.

So what’s starting to happen, and is available in a lot of these big companies at the moment, is that they are using large scale VR, such as such VR Caves and other equipment, where huge projectors will power different walls and people can go in and review different designs and engineering features, and you can wheel in a chair and sit in the chair as if you are actually sitting in the vehicle.

You can look out for different ergonomic factors such as whether you can reach all the buttons in the vehicle, whether when you are stopped at traffic lights, the pillars are blocking the view of red and green lights. Whether you can see pedestrians crossing the street and children crossing the street and all sorts of other things, and these can all be done in virtual reality now.

You can also do all the engineering stuff, such as whether the cabling, um, passes around a vehicle and whether it collides with any other objects in the vehicle, and this saves, um, either building one of these prototype vehicles or saves, uh, people trying to view it on a 2D screen which can be quite difficult, especially with stuff like wiring which is very 3D, it’s changing dimension all through the vehicle.

And so, these VR caves and other equipment currently being used by other companies cost, um, by themselves, um, 1 million, 2, 3 million pounds to actually implement just because the projectors are huge, um, the scale of the infrastructure is huge, and with the revolution in virtual reality, Oculus Rift, Vive, these big infrastructure, uhh, projects can be replaced by very inexpensive hardware.

So let’s take the HTC Vive, you can put one of these Vive’s on and experience the same level of detail as you would in this VR cave, you can sit inside a vehicle, see how the curves look inside a vehicle, how the lighting reflects off it, all within this virtual world.

You can, um, give one of these headsets to different design teams, engineering teams, CAD teams, the packaging team, the, uhh, surface design team. This will hopefully speed up their productivity, and also with the Vive’s and the Rifts, you can setup a collaborative environment.

One of the big issues with these companies is that they are very global and their engineering teams are all around the country or in different countries. And this allows people to be in the same virtual space, viewing the same virtual vehicle while being in completely different places, so you might have an engineering team in Birmingham England, and an engineering team in, uh, London for example. And they can meet up in this virtual space and collaborate on how this vehicle looks and um take screenshots in it, virtually make notes in different software and annotate these screenshots, and that’s a really powerful tool.

So with the cost of the hardware even coming down, and the acceptance it is getting in some of these automotive and aerospace companies, um, we see a huge shift uh away from building these physical vehicles into the more virtual space, saving millions and also being able to review stuff at a much earlier stage in the design cycle and the vehicle product life cycle.

What use to take 3 months to build one of these prototype vehicles, um, you can get the design time, they might have to make some changes to make it compatible with whatever software you are running on one of these VR headsets, and then you can be up and running within a week, if not a few days.

You can get the directors or managers in to review it and sign it off, whether it’s a door, casing, different dashboard, whether they like the look of it, and then pass it back to the engineering team to make changes or design team to make changes.

This has the potential to both reduce a huge amount of cost per vehicle line and also to speed up the production of, um, these productions lines and shave off months and months in development time which obviously is great for everyone.

That you very much for listening.