FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 9 |

«Speech data acquisition: the underestimated challenge Oliver Niebuhr, Alexis Michaud To cite this version: Oliver Niebuhr, Alexis Michaud. Speech ...»

-- [ Page 1 ] --

Speech data acquisition: the underestimated challenge

Oliver Niebuhr, Alexis Michaud

To cite this version:

Oliver Niebuhr, Alexis Michaud. Speech data acquisition: the underestimated challenge.

Preprint version of paper to appear in KALIPHO - Kieler Arbeiten zur Linguistik und Phonetik,

vol.. 2014. halshs-01026295v1

HAL Id: halshs-01026295


Submitted on 21 Jul 2014 (v1), last revised 6 Jan 2015 (v4) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´e au d´pˆt et ` la diffusion de documents e eo a entific research documents, whether they are pub- scientifiques de niveau recherche, publi´s ou non, e lished or not. The documents may come from ´manant des ´tablissements d’enseignement et de e e teaching and research institutions in France or recherche fran¸ais ou ´trangers, des laboratoires c e abroad, or from public or private research centers. publics ou priv´s.

e Kieler Arbeiten in Linguistik und Phonetik (KALIPHO), Inst. f.

Skandinavistik, Frisistik und Allgemeine Sprachwissenschaft (ISFAS), Christian-Albrechts-Universität zu Kiel 2 Speech Data Acquisition:

The Underestimated Challenge Oliver Niebuhr Analyse gesprochener Sprache Allgemeine Sprachwissenschaft Christian-Albrechts-Universität zu Kiel niebuhr@isfas.uni-kiel.de Alexis Michaud International Research Institute MICA Hanoi University of Science and Technology, CNRS, Grenoble INP, Vietnam Langues et Civilisations à Tradition Orale, CNRS/Sorbonne Nouvelle, France michaud.cnrs@gmail.com The second half of the 20th century was the dawn of information technology;

and we now live in the digital age. Experimental studies of prosody develop at a fast pace, in the context of an “explosion of evidence” (Janet Pierrehumbert, Speech Prosody 2010, Chicago). The ease with which anyone can nowdo recordings should not veil the complexity of the data collection process, however. This article aims at sensitizing students and scientists from the various fields of speech and language research to the fact that speech-data acquisition is an underestimated challenge. Eliciting data that reflect the communicative processes at play in language requires special precautions in devising experimental procedures and a fundamental understanding of both ends the elicitation process, speaker and recording facilities. The article compiles basic information on each of these requirements and recapitulates some pieces of practical advice, drawing many examples from prosody studies, a field where the thoughtful conception of experimental protocols is especially crucial

1. Introduction: Speech Data Acquisition as an Underestimated Challenge The second half of the 20th century was the dawn of information technology; and we now live in the digital age. This results in an “explosion of evidence” (Janet Pierrehumbert, Speech Prosody 2010, Chicago), offering tremendous chances for the 2 Oliver Niebuhr & Alexis Michaud analysis of spoken language. Phoneticians, linguists, speech therapists, speech technology specialists, anthropologists, and other researchers routinely record speech data the world over. There remains no technological obstacle to collecting speech data on all languages and dialects, and to sharing these data over the Internet. The ease and the speed with which recordings can now be conducted and shared should not veil the complexity of the data collection process, however.

Phonetics “calls on the methods of physiology, for speech is the product of mechanisms which are basically there to ensure survival of the human being; on the methods of physics, since the means by which speech is transmitted is acoustic in nature; on methods of psychology, as the acoustic speech-stream is received and processed by the auditory and neural systems; and on methods of linguistics, because the vocal message is made up of signs which belong to the codes of language” (Marchal 2009:ix). In addition to developing at least basic skills in physiology, physics, linguistics, and psychology, each of which has complexities of its own, people conducting phonetic research are expected to have a good understanding of statistical data treatment, combined with a command of one or more specific exploratory techniques, such as endoscopy, ultrasonography, palatography, aerodynamic measurements, motion tracking, electromagnetic articulography, or electroencephalography (for a description of the many components of a multisensor platform see Vaissière et al. 2010). As a result, it tends to be difficult to maintain a link between the phonetic sciences and fields of the humanities that are highly relevant for phonetic studies, and in particular for the study of prosody. Phoneticians’ training does not necessarily include disciplines that would develop their awareness of the complexity and versatility of language, such as translation studies, languages, literature and stylistics, historical phonology, and sociolinguistics/ethnolinguistics. Moreover, the increasing use of digital and instrumental techniques in phonetic research is, taken by itself, a welcome development. But more and more phoneticians neglect explicit and intensive ear training, forgetting that an attentive, trained ear is the key to observations and hypotheses and hence the prerequisite for any analysis by digital and instrumental techniques. For example, we do not think that successful research on prosody can be done without the ability to produce and identify the prosodic patterns that one would like to analyse. As Barbosa (2012:33) puts it: “The observation of a prosodic fact is never naïve, because formal instruction is necessary to see and to select what is relevant”.

In summary, advances in phonetic technologies impose many challenges on modern phoneticians, and they can tend to replace rather than complement traditional skills.

This has a direct bearing on data collection procedures. To a philologist studying written documents, it is clear that every detail potentially affects interpretation and analysis (The complexities of Greek and Latin texts are perfect examples; see, e.g., Probert 2009; Burkard 2014). Carrying the same standards into the field of speech data collection, it goes without saying that every speaker is unique, that no two recording situations are fully identical, and that human subjects participating in the experiments are no “vending machines” that produce the desired speech signals by paying and pressing a button. An experience of linguistic fieldwork, or of immersion learning of a foreign language, entails similar benefits in terms of awareness of the central importance of communicative intention (see in particular Bühler 1934, passim; Culioli 1995:15; Barnlund 2008), and of the wealth of expressive possibilities and redundant Speech Data Acquisition 3 encoding strategies open to the speaker at every instant (as emphasized, e.g., by Fónagy 2001). Researchers working on language and speech are no “signal hunters”, but hunt for functions and meanings1 as reflected in the speech signal, which itself is only one of the dimensions of expression, together with gestures and facial expressions. The definition of tasks, their contextualization, and the selection of speakers are at the heart of the research process.

The diversification of the phonetic sciences is likely to continue, together with technological advances; the literature within each subfield is set to become more and more extensive, making it increasingly impractical for an individual to develop all the skills that would be useful as part of a phonetician’s background. This results in modular approaches, as against a holistic approach to communication. What is at stake is no less than a cumulative approach to research. The quality of data collection is inseparable from the validity and depth of research results; and data sharing is indispensible to allow the community to evaluate the research results and build on them for further studies.

Against this background, the present article is primarily intended for an audience of advanced students of phonetics. However, it is hoped that it can also serve as a source of information for phonetic experts and researchers who have a basic understanding of phonetics but work in other linguistic disciplines, including speech technology. The present article summarizes some basic facts, methods, and problems concerning the three pillars of speech data acquisition: the speaker (§2), the task (§3), and the recording (§4). Discussion on these central topics build on our own experiences in the field and in the lab. Together, the chapters aim to convey to the reader in what sense data acquisition is an underestimated challenge. Readers who are pressed for time may want to jump straight to the Summary in section 5, which provides tips and recommenddations on how to meet the demands of specific research questions and achieve results of lasting value for the scientific community.

Given its aim, our article is both more comprehensive and introductory than other methodologically oriented papers such as those by Mosel (2006), Himmelmann (2006), Ito and Speer (2006), Xu (2011), Barbosa (2012), and Sun and Fletcher (2014), which are all highly recommended as further reading. Most readers are likely to know much if not most of what will be said. Different readers obviously have different degrees of prior familiarity with experimental phonetics; apologies are offered to any reader for whom nothing here is new.

The two terms ‘meaning’ and ‘function’ tend not to be clearly separated in the literature – including in the present article, in which we simply use both terms in combination. In the long run, a thorough methodological discussion should address the issue of the detailed characterization of ‘meaning’ and ‘function’. To venture a working definition, meanings refer to concrete or


entities or pieces of information that exist independently of the communication process and are encoded into phonetic signs.

Functions, on the other hand, are conveyed by phonetic patterns that are attached to these phonetic signs; they refer to the rules and procedures of speech communication. If meanings are the driving force of speech communication, then functions are the control force of speech communication.

4 Oliver Niebuhr & Alexis Michaud

2. The speaker

2.1 Physiological, social, and cognitive factors Individual voices differ from one another. Physiological differences are part of what Laver (1994, 27–28) refers to as the “organic level”; they are extralinguistic, but are nevertheless of great importance to analyzing and interpreting speech data. Age and body size are perfect examples for this (cf. Schötz 2006), affecting, among others, F0, speaking rate (or duration) and spectral characteristics such as formant frequencies.

Physiological variables are intertwined with social variables. For instance, there are physiological and anatomical differences between the male and female speech production apparatus, which lends female speakers a higher and breathier voice as well as higher formant values and basically allows them to conduct more distinct articulatory movements than their male counterparts within the same time window (Sundberg 1979;

Titze 1989; Simpson 2009, 2012). So, “if we randomly pick out a group of male and female speakers of a language, we can expect to find several differences in their speech“ (Simpson 2009:637).

However, Simpson (2009) also stresses in his summarizing paper that gender differences in speech do not merely have a biophysical origin. Some differences are also due to learned, i.e. socially evoked behaviour, and the dividing line between these two sources of gender-related variation cannot always be easily determined. The social phenomenon of “doing gender” is well documented; it is an object of attention on the part of speakers themselves, and ‘metalinguistic’ awareness of gender differences in speech is widespread, particularly with respect to grammar and lexicon (cf. Anderwald 2014). Gender-related phonetic differences are less well documented. The frequent cross-linguistic finding that women speak slower and more clearly than men is probably at least to some degree attributable do “doing gender” (cf. Simpson 2009). Further, more well-defined differences between the speech of men and women are documented by Haas (1944) for Koasati, a Native American language. Sometimes women have exclusive mastery of certain speaking styles: mastering whispered speech, including the realization of tonal contrasts without voicing, used to be part of Thai women’s traditional education (Abramson 1972). In languages where the differences are less codified, they are nonetheless present: Ambrazaitis (2005) found gender differences in the realization of terminal F0 falls at the ends of utterances in German and – more recently – also in English and Swedish (see also Peters 1999:63). Compared with male speakers, female speakers prefer pseudo-terminal falls that end in a deceleration and a slight, short rise at a relatively low intensity level (Ambrazaitis 2005). This pseudo terminal fall reduces the assertiveness/finality of the statement, as compared with a terminal fall.

In extreme cases, this pattern might be mistaken for an actual falling-rising utterancefinal intonation patterns, which has a different communicative function. Phonetically, the difference is not considerable: a rise on the order of 2 to 4 semitones for the pseudoterminal fall, of 6 semitones for a falling-rising utterance-final pattern.

Another socially-related phenomenon is the so-called ‘phonetic entrainment’ or ‘phonetic accommodation’. That is, when two speakers are engaged in a dialogue, they become phonetically more similar to each other, particularly when the interaction is cooperative and/or when the two dialogue partners are congenial with each other (cf.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 9 |

Similar works:


«Revista Electrónica Educare E-ISSN: 1409-4258 educare@una.ac.cr Universidad Nacional Costa Rica Villalobos González, Carmen María La afectividad en el aula preescolar: Reflexiones desde la práctica profesional docente Revista Electrónica Educare, vol. 18, núm. 1, enero-abril, 2014, pp. 303-314 Universidad Nacional Heredia, Costa Rica Disponible en: http://www.redalyc.org/articulo.oa?id=194129374016 Cómo citar el artículo Número completo Sistema de Información Científica Más...»

«DISCUSSION PAPER SERIES IZA DP No. 1450 Hobbes to Rousseau: Inequality, Institutions, and Development Matteo Cervellati Piergiuseppe Fortunato Uwe Sunde January 2005 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor Hobbes to Rousseau: Inequality, Institutions, and Development Matteo Cervellati University of Bologna Piergiuseppe Fortunato University of Bologna Uwe Sunde IZA Bonn and University of Bonn Discussion Paper No. 1450 January 2005 (updated February 2006 – old...»

«PENANCE AND THE CONFESSIONAL 231 PENANCE AND THE CONFESSIONAL. REV. T. C. BY HAMMOND, M.A., Superintendent Irish THE Church Missions. W E fixingOscar D. Watkinsoutset the of someword and its Greek are confronted at the with little difficulty in accurately the definition the word penance. According to both Latin equivalent are used in three distinct senses: (r) the emotion or sentiment of penitence; (2) the penance, penalty or course of humiliation assigned or undertaken ; (3) the institution,...»

«THE PNC FINANCIAL SERVICES GROUP, INC. (“PNC”) DODD-FRANK ACT COMPANY-RUN STRESS TEST DISCLOSURES MARCH 5, 2015 Pursuant to regulations issued by the Board of Governors of the Federal Reserve System (“Federal Reserve”) and the Office of the Comptroller of Currency (“OCC”) under the Dodd-Frank Wall Street Reform and Consumer Protection Act (“Dodd-Frank Act”), The PNC Financial Services Group, Inc. (NYSE: PNC) and PNC Bank, National Association (“PNC Bank”) are required to...»

«The Memoirs Of Lætitia Pilkington Vol II Published by the Ex-classics Project, 2013 http://www.exclassics.com Public Domain LÆTITIA PILKINGTON CONTENTS TITLE PAGE DEDICATION PREFACE The Memoirs of Laetitia Pilkington Volume 2 -2MEMOIRS, VOL II. TITLE PAGE MEMOIRS OF Mrs. Lætitia Pilkington, WIFE TO THE Rev. Mr. Matthew Pilkington. Written by HERSELF. Wherein are occasionally interspersed, Her POEMS, With a VARIETY of SECRET TRANSACTIONS of some EMINENT PERSONS....»

«Office of Thrift Supervision Department of the Treasury 1700 G Street, N.W., Washington, DC 20552 • (202) 906-6000 February 11, 2011 Paul M. Aguggia, Esq. Christina M. Gattuso, Esq. Kilpatrick Townsend & Stockton LLP 607 14th Street, NW, Suite 900 Washington, D.C. 20005 Re: Franklin Financial Corporation MHC, Glen Allen, Virginia (MHC), OTS No. H-3706, and Franklin Federal Savings Bank, Glen Allen, Virginia (Savings Bank), OTS No. 02386, Conversion Application Under 12 C.F.R. Parts 563b and...»

«Commercial in Confidence FarSounder, Inc.SPS for the Ship Operator: Diver Detection Sonar Explained (F31561 Rev. 1.0) FarSounder, Inc. 43 Jefferson Blvd. Warwick, RI 02888 United States phone: +1 401 784 6700 fax: +1 401 784 6708 info@farsounder.com www.farsounder.com Revision Issue Date: June, 2011 © Copyright 2011 FarSounder, Inc. All rights reserved. This document contains confidential information that is proprietary to FarSounder, Inc. This commercially sensitive information is being...»

«NOT FOR PUBLICATION FILED UNITED STATES BANKRUPTCY COURT JA M E S J. W A LD R O N, C LE R K FOR THE DISTRICT OF NEW JERSEY January 11, 2007 U.S. B AN K R U PT C Y C O U R T C A M D E N, N.J. : B Y: s/E lizabeth G rassia, D eputy IN RE: : : JAMES M. NUTTALL, JR., : CHAPTER 13 : Debtor. : CASE NO. 06-14233 (GMB) : : MEMORANDUM OPINION APPEARANCES: Nona L. Ostrove, Esquire Subranni, Ostrove & Zauber 1020 Laurel Oak Road, Suite 100 Voorhees, NJ 08043 Attorney for Debtor Joseph C. Grassi,...»

«Case Study Tri-sector Collaboration Pathway to Employment in Aged Care A 2014 ACT Better Linkages between Employment and Training Initiative Project The project in brief. This project sought to trial and implement a strategy to support job seekers entering training leading to employment in the Aged Care sector. The strategy included: • An aptitude assessment • Customised work readiness preparation and foundation skills • Employer mentoring leading to supervised work experience placements...»

«Abed, Mira Page 1 Applicant #1 Spring 2010 Scholarship Application R SECTION A: Personal Data Applicant Info: Applicant #1 Name: Mira Abed Email: mkabed@ncsu.edu Class: Junior Major: 1) Polymer and Color Chemistry-BS and 2) International Studies-BA Credits Earned at NC State as of September 25, 2009: 91 Cumulative GPA: 3.98 Study Abroad Program Info: Name/Location: NCSU Summer Program: Egypt Anticipated Term: Summer 2010 Have you already applied for this program? YES SECTION B: Essay Questions...»

«MINUTES OF THE MEETING OF THE HOUSING TRUST FUND CORPORATION (HTFC) HELD ON JULY 14, 2016 AT 9:30 A.M. Locations: 641 Lexington Avenue, New York, New York 95 Perry Street, Commissioner’s Conference Room, Buffalo, New York Hampton Plaza, Commissioner’s Conference Room, Albany, New York Locations connected by video-conference MEMBERS PRESENT: James S. Rubin, Chairman Marian Zucker representing William C. Thompson, Member Howard Zemsky, Member OTHERS PRESENT: Alex Abrams Dan Buyer Mark Colon...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.