Automatic transcription

Real-time captioning: Four options for your live event

February 28, 2020 Xiao Li

Real-time captioning: Four options for your live event image

Live event planners have been facing increasing demand for real-time transcription or captioning. In the past, real-time captioning has been a pricey proposition, requiring organizers to budget for the cost of hiring a transcriptionist. Happily, advances in speech-to-text software and automatic transcription have widened the field, giving organizations a range of choices for adding live captions. But faced with an array of options, how do you decide on a transcription solution? In this post, we’ll run down some of the pros and cons of four different ways to add live captions to your event.

Four options for real-time captioning

When it comes down to it, there are four ways to add real-time captioning to your live event:

    1. Hire a transcriptionist

    In the past, hiring a professional transcriptionist was the only option for captioning in real time. This approach involves hiring an individual who listens to proceedings (on site or remotely) and transcribes on the fly.

    There are advantages to human transcriptionists. A human can listen closer to someone who is speaking softly and still be able to discern what they mean, while an AI-based system may not be able to reach the same level of precision. Some medical or legal events may require the transcriptionist to carry certain professional certifications. For example, a certified health documentation specialist credential indicates the transcriptionist understands clinical health terminology and can apply that knowledge to correct commonly confused medical terms to ensure an accurate medical record. In other instances, an experienced professional may be able to parse industry-specific terminology or slang that some automatic transcription solutions may struggle with.

    But human transcriptionists are also highly variable in quality and reliability. Someone transcribing a single 20-minute speech could be highly accurate, but that accuracy rate could change if you’re asking them to transcribe four hours of lectures. Similarly, that transcriptionist could be taken out of commission by inclement weather, unexpected illness or personal emergencies. Finally, not all transcriptionists carry the equipment needed to share captions in real time. In addition to booking someone with gear that can connect to AV equipment, it will likely be on you to find a way to ensure transcripts can be shared with your audience in real-time.


    Highly variable, with prices ranging from $90 to $180 an hour, with experienced or credentialed transcriptionists coming in at the higher end of the scale. Transcriptionists may bill you at an overtime rate for longer events, further increasing costs.


    • Humans are better at understanding poor quality audio
    • Experienced transcriptionists can parse industry-specific terms, slang, or informal language


    • Expensive, especially those with specialized skills
    • Reliability is variable
    • Sharing transcript live requires equipment and know-how
    • Low availability, high demand

    Bottom line:

    While there are definitely places where a human transcriptionist is required, the high price point can be prohibitive. High demand for real-time transcription services only continues to drive that price up, and it may mean a professional transcriptionist isn’t available on the day or time of your event.

    2. Buy a hardware solution

    A relatively new entry to the speech-recognition market, hardware solutions provide a simplified real-time transcription option. A hardware device includes a way to capture audio live, convert that speech to text, and share that transcription with guests. Typically, these devices connect directly to a local audio source to ensure a clean audio feed and also include some kind of standardized video output to share the transcription on a monitor in real time.

    A dedicated hardware solution also removes possible points of failure present in an AI-transcription solution that relies on a computer or mobile device. A dedicated hardware transcriber will not suffer from a blue screen of death, receive unexpected text messages during an important presentation, or require the same ongoing care and maintenance that other solutions might.

    Epiphan LiveScrypt

    A purpose-built hardware solution will also include extra features depending on the hardware developer. LiveScrypt, Epiphan Video’s own dedicated automatic real-time transcription solution, supports transcription in over 20 languages and dialects, and includes additional features like profanity filters and text scaling to ensure visibility on connected monitors.

    These solutions have a higher initial investment cost, expressed as the cost of hardware itself. This cost may be steep for some, but organizations and people in regular need of transcription break even quickly.

    Examples include a college or university intending to caption several lectures a day, or a convention planner who wants to transcribe dozens of speakers at each event they stage. Even after the cost of hardware is accounted for, the per-hour cost of transcription remains far below the cost of professional transcriptionists for those organizations and groups.


    Variable. People and organizations buying into a hardware solution will need to pay for the hardware itself as well as the ongoing subscription costs of transcription. However, the cost of these services still remain far below the cost of hiring a transcriptionist , and value-for-money increases the more the hardware is used.


    • Affordability
    • Reliability
    • Speed
    • Built-in professional audio connections to ensure high-quality transcription
    • Standardized video output connection to share transcripts live
    • Simple setup


    • Initial investment cost

    Bottom line:

    For live events that require real-time captioning or transcription of proceedings, hardware solutions are the most hassle-free option on the market.

    epiphan pearl mini

    Simplified real-time automatic transcription

    Provide real-time transcription at your next live event the easy way, with LiveScrypt.

    Learn more

    3. Build a cloud-based transcription solution

    Services like Google Speech-to-Text, Amazon Transcribe, and IBM Watson Speech to Text all use very similar technology to convert speech into text. In brief, automatic transcription services take a digital audio signal, breaks that signal into smaller segments of sound, and compares those segments (also called phonemes) to an existing database of language sounds. When a match is found, the service then determines what word those phonemes are constructing, and returns a result as text.

    The process typically requires a lot of computing power, which is why these services use cloud computing to deliver quick results. The accuracy of AI transcription services is often comparable to human typists, with the gap between the two narrowing daily.Cloud-based transcription solutions: IBM Watson Speech to Text, Google Speech-to-Text and Amazon TranscribeConsistency and lack of downtime are a couple of the clear benefits of using a cloud-based automatic transcription service. While it’s unreasonable to expect a human to transcribe hours of speech without a break, that kind of task is well-suited for AI-powered speech recognition services. Modern AI-based transcription services can yield word error rates low enough for real-time event captioning.

    The cost of these services is also significantly lower than working with a professional transcriptionist, making it attractive for longer events with many hours of speech to be transcribed and for organizations staging multiple live events throughout a year.

    The low price also means it’s possible to offer real-time captioning from end to end. A conference organizer using a professional transcriptionist may be forced to limit captioning to one or two keynote speeches for budgetary reasons. But for a fraction of that price, an automatic transcription service could caption the entire event – from the keynote presentation to the final word.

    But cloud services also require a degree of computer competency that is beyond many organizations. These services provide a way for digital audio to be converted to text, but a coder is required to develop a program which interacts with that cloud service. That process takes time, and will require testing, patching, and updating as problems emerge.

    You will also need some form of local console that can convert an analog audio signal to a digital one, send that signal to the cloud, and receive your transcription. This can be some kind of personal computer, though the general-purpose nature of these presents a few challenges. This includes unplanned system updates and unknowing bystanders interfering with the transcribing to charge their phones.

    Most personal computers also do not natively have a way to receive professional audio feeds, such as through XLR connections. It’s possible to add that capacity using expansion cards or an external sound card, but this adds complexity to the job. Any computer enthusiast will tell you that care and maintenance of a computer – even a single-purpose one – is an ongoing task.


    Among the most affordable options, prices range from  96 an hour for Google Text-to-Speech to $1.44 an hour for Amazon Transcribe. Price can also come down with volume. IBM, for instance, offers discounted rates for users who need to transcribe over 250,000 minutes, 500,000 minutes, or one million minutes of speech.

    You will also need a computer to send audio to the cloud, receive the transcription, and share it with your audience. While some organizations may have that gear handy, building a PC for this purpose increases cost.


    • Low cost
    • High reliability
    • Accuracy
    • Speed


    • Setup is complex
    • Requires some kind of local interface to use cloud service
    • Computer could be expensive if not already available

    Bottom line:

    Low cost makes this an attractive option, but cloud services still rely on you to find a way to gather audio and share the transcription live. The added complications associated with sourcing a local console capable of doing this may make this option unappealing for people and organizations looking for a simple way to add live transcription to an event.

    4. Find a speech-to-text app

    While phone-based speech-recognition apps have many effective uses, they’re limited by the hardware they’re tied to. Smartphones and tablets are limited by storage and processing capacity, while microphone audio on a smart device can be variable in quality. This may mean their best applications are typically in one-on-one interviews, or small meetings rather than a large lecture, in a hall where the speaker may be far from the transcribing phone.

    App-based solutions are also dependent on the app developer to add functionality and resolve issues. More popular apps will be responsive to user needs with developers rolling out regular updates, but an app developed by an independent firm or individual user could see updates stopped, or be abandoned completely.

    Users will also need a way to share the transcript. Smartphones and tablets capable of using these apps are not typically designed with audiences in mind; sending the transcript to a large screen will require additional setup. Plus, solutions that rely on a smartphone are vulnerable to unexpected phone calls, instant messages, and software updates.

    Epiphan video thumb


    Variable. Many apps are free for individual users but require you to pay for a monthly or by-the-minute plan after exceeding a certain number of minutes. Some services have a monthly minutes cap, which could be a dealbreaker for people with a lot of audio that needs transcribing.


    • Audio capture usually done natively
    • Simple setup


    • Expensive
    • Audio quality is variable, affecting transcription accuracy
    • Limited by phone hardware
    • Support is dependent on app developer
    • Some apps include a minutes cap
    • No easy way to share transcription

    Bottom line:

    Cost remains relatively low and transcription quality is typically fairly high, but difficulties gathering audio and sharing the transcription with a wide audience remain a barrier for live event organizers.

    Simplify your real-time captioning setup

    Only you will be able to determine which of these solutions is best suited for your live event. Smaller events may be able to use a smartphone-based app without problems, while more tech-savvy users may enjoy the idea of building a computer with pro audio connections to use a cloud-based solution. However, the additional functionality and abilities built into hardware solutions mean organizers who are looking to add transcription to their live events on a regular basis should definitely take a long look at a dedicated hardware option.

    LiveScrypt is geared toward real-time transcription for a broad range of events, offering additional features like a profanity filter, support for over 20 languages. LiveScrypt is also supported by Epiphan’s developers and in-house technical support team, ensuring new updates are constantly being produced and problems you encounter are handled by a human being ready to address the issue.

    LiveScrypt has a simple setup process and easy operation, minimizing the technological complexity of your live event.

    Contact our sales team to learn more about how LiveScrypt can benefit your organization or to arrange a live demo.

    Leave a Reply