Online Forced Aligner

About

The online Forced Aligner is a resource intended to assist linguistics research by providing an easy way to align English voice recordings with scripts and/or word lists. In addition, the online Forced Aligner provides easy way to visualize the basic phonetic data from these recordings as well as a facility for downloading this data as a starting point for more in depth observations. Output from the program takes the form of a Praat TextGrid file in the case of aligned data, and, in the case of downloadable phonetic data, a CSV (comma separated value) file that should be readable by any common spreadsheet application.

The online Forced aligner is based on the Penn Phonetics Lab Forced Aligner for English, which conducts the actual phonetic alignment. The purpose of the online Forced Aligner was to develop an easier to use, more streamlined, and more accessible way to use what is already a powerful piece of software. Penn Phonetics Lab Forced Aligner is only runnable from a command line interface, which can make it difficult to use for those who might not have experience running software in such a way. The online Forced Aligner adds a visual, web-based interface on top of the Penn Phonetics Lab Forced Aligner, which allows it to be run from any computer at any time, as well as integrating several processes, such as creating a CSV file, which should make it easier to begin working with the data that is output from the phonetic alignment operation.

Usage Instructions

The Forced Aligner has two workflows which differ in their required input depending on your desired function. The first workflow takes a sound recording and a script or word list as input, aligns the data, and generates output. The second workflow takes a sound recording and a previously aligned and/or properly annotated Praat TextGrid, and generates output without any additional alignment. These two workflows are provided because THE FORCED ALIGNER IS NOT PERFECT, and inspecting and editing the output of the Forced Aligner is highly recommended. Ideally, one should be able to run the first workflow, download the data, inspect the initial alignment and edit as necessary, and then run the second workflow if the visualization ability and automatically generated phonetic data is desired. One should also note that the online visualization displays only very basic phonetic information, whereas the downloadable CSV contains quite a bit more, including: Average F0 for each segment (if applicable), and F1 - F3 values at the 1/3, 1/2, and 2/3 points of each segment.

  1. Alignment desired:
    • Inputs:
      1. A waveform audio file (.wav) containing your sound recording
      2. A plain text document (.txt) containing a script or word list corresponding to your sound recording
  2. No alignment desired:
    • Inputs:
      1. A waveform audio file (.wav) containing your sound recording
      2. A previously annotated Praat TextGrid file (.TextGrid) corresponding to your sound recording that has been properly formatted either by hand or by the Forced Aligner
        • See below for a description of the TextGrid formatting used by the Forced Aligner

Arpabet

Arpabet (also rendered as ARPABET or ARPAbet), is a phonetic transcription system that was developed in the 1970s which uses sequences of 1, 2, or 3 characters to represent phonemes in place of the standard IPA symbols. This system was created because computers of the time had no way to represent IPA symbols due to the limited character set allowed by ASCII character encoding. Even on modern day computer systems, support for the full range of IPA characters is not always guaranteed. The alignment algorithm upon which the Forced Aligner is based uses Arpabet phonemes and therefore the aligned Praat TextGrid will contain these as well. The online visualization portion of the Forced Aligner will display standard IPA characters. More information is available on Wikipedia: https://en.wikipedia.org/wiki/ARPABET.

Credits

The Online Forced aligner is based on the Penn Phonetics Lab Forced Aligner for English, which is itself based on the HTK toolkit developed by Cambridge University Engineering Department. Open source software Praat is also used in the alignment process and generation of downloadable data. Integration of this software as well as development of the online interface was completed by Gersh Pevnick as an undergraduate research project at the University of Wisconsin-Milwaukee (UWM), under the supervision of Hanyong Park, Associate Professor at UWM and head of the UWM Phonetics Lab. Web server implementation and internal web development were carried out by Jeremy Streich of the Information Technology Office in the College of Letters and Science at UWM. It is hosted and maintained by the Information Technology Office in the College of Letters and Science at UWM.

References

  1. Boersma, Paul & Weenink, David (2019). Praat: doing phonetics by computer [Computer program]. Version 6.0.49, retrieved 2 March 2019 from http://www.praat.org/
  2. Hidden Markov Model Toolkit [Computer program]. Version 3.4 (2006) retrieved 28 Sept. 2018 from http://htk.eng.cam.ac.uk/
  3. Jiahong Yuan and Mark Liberman (2009). Penn Phonetics Lab Forced Aligner for English [Computer program]. Version 1.002, retrieved 28 Sept. 2018 from https://web.sas.upenn.edu/phonetics-lab/facilities/