Forced Aligner

Upload Data for Forced Aligner

Upload Form
Instructions

The Forced Aligner has two workflows which differ in their required input depending on your desired function. The first workflow takes a sound recording and a script or word list as input, aligns the data, and generates output. The second workflow takes a sound recording and a previously aligned and/or properly annotated Praat TextGrid, and generates output without any additional alignment. These two workflows are provided because the forced aligner is not perfect, and inspecting and editing the output of the Forced Aligner is highly recommended. Ideally, one should be able to run the first workflow, download the data, inspect the initial alignment and edit as necessary, and then run the second workflow if the visualization ability and automatically generated phonetic data is desired. One should also note that the online visualization displays only very basic phonetic information, whereas the downloadable CSV contains quite a bit more, including: Average F0 for each segment (if applicable), and F1 - F3 values at the 1/3, 1/2, and 2/3 points of each segment.

  1. Alignment desired:
    • Inputs:
      1. A waveform audio file (.wav) containing your sound recording
      2. A plain text document (.txt) containing a script or word list corresponding to your sound recording
  2. No alignment desired:
    • Inputs:
      1. A waveform audio file (.wav) containing your sound recording
      2. A previously annotated Praat TextGrid file (.TextGrid) corresponding to your sound recording that has been properly formatted either by hand or by the Forced Aligner
        • See below for a description of the TextGrid formatting used by the Forced Aligner
Arpabet

Arpabet (also rendered as ARPABET or ARPAbet), is a phonetic transcription system that was developed in the 1970s which uses sequences of 1, 2, or 3 characters to represent phonemes in place of the standard IPA symbols. This system was created because computers of the time had no way to represent IPA symbols due to the limited character set allowed by ASCII character encoding. Even on modern day computer systems, support for the full range of IPA characters is not always guaranteed. The alignment algorithm upon which the Forced Aligner is based uses Arpabet phonemes and therefore the aligned Praat TextGrid will contain these as well. The online visualization portion of the Forced Aligner will display standard IPA characters. More information is available on Wikipedia: https://en.wikipedia.org/wiki/ARPABET.