IDENTIFICATION OF THE CONTRAST FULL VOWEL- SCHWA: TRAINING EFFECTS AND GENERALIZATION TO A NEW PERCEPTUAL CONTEXT

This study examines the ability to identify the English phonological contrast full vowel-schwa by Spanish learners of English after two different types of training: auditory and articulatory. Perceptual performance was measured in isolated words in order to investigate the effect of training and in sentences to study the robustness of acquisition in generalizing to a context which was not used during training. Subjects were divided into three groups: two experimental groups, one undergoing perceptual training and one undergoing production based training, and a control group. Both experimental groups’ perception of the reduced vowel improved significantly after training. Results indicated that students were able to 174 Esther G. Lacabex, María Luisa G. Lecumberri & Martin Cooke generalize their reduced vowel identification abilities to the new context. The control group did not show any significant improvement. Our findings agree with studies that have demonstrated positive effects of phonetic training (Derwing. Munro & Wiebe, 1998; Rochet, 1995; Cenoz & García Lecumberri, 1995, 1999). Interestingly, the results also support the facilitating view between perception and production since production training proved beneficial in the development of perceptual abilities (Catford & Pisoni, 1970; Mathews, 1997). Finally, our data showed that training resulted in robust learning, since students were able generalize their improved perceptual abilities to a new context.


Introduction Introduction Introduction Introduction Introduction
Perception of non-native speech contrasts has been acknowledged to pose difficulties for second language (L2) speakers (Best, 2001).These problems may stem from a variety of factors including the influence of the L1 (first language) phonological system, markedness (Eckman, 1977) universal tendencies or individual characteristics such as age of learning onset, motivation, language experience or specific training (Cenoz & García Lecumberri, 1995).The difficulty scale has been reported to range from near-chance to near-ceiling in a performance continuum (Best, 2001, p.775).However, there are studies (Bongaerts, van Summeren, Planken & Schils, 1997;Markham, 1997;Escudero, 2006) that have shown that L2 speakers can learn to perceive new L2 sounds in a native-like manner and theories such as Flege's Speech Learning Model (SLM) defend the idea that the native language sound system mechanisms "remain intact over the life span" (1995, p.239) and that it is possible for adult learners to establish new non-native phonetic categories.Models such as SLM or the Perceptual Assimilation Model (PAM) predict good acquisition of a non-native sound in specific 'favourable cases': SLM considers perception of L2 sounds as filtered though the L1 sounds system and establishes that if a sound is 'new' or different from any L1 existing sound, this may be well acquired; PAM (Best, 2001) predicts good to excellent discrimination abilities if a sound is a 'nonassimilable' non-speech sound or a sound that "bears no detectable similarity to any native sound" (2001, p.777).In short, these 'favourable cases' are viewed by these two quite influential models as acquisition situations in which the L1 has the least possible influence or the smallest possible filtering effect.
Our study focuses on the assessment of the ability of Spanish learners of English to identify 'schwa', the English unrounded midcentral vowel.Whilst schwa is a very frequent sound in English as the nucleus of many unstressed syllables both in content words and function words (see details below), the Spanish vowel space has an empty central area with no vowel categories.According to both SLM and PAM, schwa would tend not to be perceived in terms of L1 sounds since it is completely different acoustically; therefore, it should be easily acquired.Nevertheless, in contexts in which language is being introduced formally and in its written form from the beginning (as is the case in most formal instruction settings), besides acoustic and/or perceptual similarity, other factors such as orthography may intervene to make this sound assimilate to L1 vowel sounds 1 and thus be perceived and produced with full qualities.
Different approaches have been used in the literature to discuss vowel reduction.It has been explained as a correlate of stress when described as the shortening in duration and obscuration of quality of nuclear vowels in unstressed syllables (Fudge, 1984;Rietveld & Koopmans-van Beinum, 1987).Lindblom (1963), having established target vowel values or invariant vowel attributes, proposed that vowel formant frequencies fail to reach target representations when duration is shortened.He refers to this phenomenon as 'formant undershoot'.Articulatorily, this reflects the fact that articulators do not reach target positions in connected speech.Two possible reasons have been mentioned in the literature to explain this phenomenon (Clark & Yallop, 1999, pp.85-86).On the one hand, target undershoot could be due to motor constraints, since the articulators "resist being set in motion" due to inertia.On the other hand, undershoot could be caused by linguistic factors such as ease of articulation or establishing a level of articulatory performance that is adequately intelligible with the minimum possible effort.
While the type of vowel reduction discussed above is very frequent in the languages of the world, the degree of reduction differs between languages.Vowel reduction in Spanish (Delattre, 1969) involves slight centring movements: front vowels /i/and /e/ move backwards, back vowels /u/ and /o/ move frontwards and /a/ moves upwards in the vowel plane in unstressed syllables.However, none of the movements described above cause Spanish vowels to lose their intrinsic quality (Quilis & Fernández, 1996).In English, however, reduced vowels have become phonemes or sound categories and thus distinctive in contrasts of the type full vowel-weak vowel in unstressed syllables (exorcise-exercise, Normantown-Normanton).This leads to a further view of vowel reduction describing those reduced vowels which, after synchronic and diachronic phonetic and phonological processes such as stress shift or vowel change, have become contrastive categories, as in the case of English.This phenomenon has been referred to as 'lexical vowel reduction ' (van Bergem, 1995) or 'phonological vowel reduction' (Fourakis, 1991).When looking into the diachronic evolution of schwa in the English language, we can observe how it has evolved from being a non-phonemic sound occurring in some unstressed syllables in Old English to a more generalized occurrence in unstressed syllables in Middle English and reported to have phonological status in Early Modern English and Modern English (Cruttenden, 1994).Factors such as rhythm, frequency of occurrence or word class are invoked to explain this change.In fact, a similar sound change as the one which happened in English has been reported to be occurring in Dutch (van Bergem, 1995, p.354) as the ambiguity in the perception of vowels being pronounced with some variation can "impulse a sound change" through which an intended full vowel (phoneme) which is not consistently realized as such becomes an intended reduced vowel (phoneme).Van Bergem analysed three conditions which lead to this sound change: identification of vowel reduction as full vowel and schwa, frequency of occurrence and speaking style.He concluded that native Dutch speakers ambiguously classified full vowels and schwa and that schwa was reported much more often in highly frequent words and in a more casual speaking style.This was interpreted by the author as "excellent pre-conditions for a sound change" (1995, p.357).He also considers, however, that orthography may be preventing this change from fully happening.
However, some challenge the phonological status of schwa.Given the high variability of schwa realizations observed due to coarticulatory effects (Recasens, 1991;Kondo 1994), the phonemic status of the unrounded central vowel has been questioned.Schwartz, Boe, Vallée and Abry (1997) have described schwa as a sound that escapes the traditional vowel space and would belong to a 'parallel' system associated with lenition.
Among previous studies on the L2 acquisition of vowel reduction, the study conducted by Flege and Bohn (1989) examined the production of morphophonological alternations in Spanish-accented English ('able'-'ability') in order to explore vowel quality and stress auditorily and instrumentally.They concluded that stress placement was acquired earlier than vowel reduction and that "the ability to unstress vowels is a necessary, but not sufficient, condition for vowel reduction" (1989:35) since they were treated as independent phenomena by Spanish speakers.Kondo (1994) explored the production of English vowel reduction in weak forms by advanced speakers of Japanese.She concluded that her subjects failed to produce the contextual coarticulatory patterns but showed some awareness of the sound.Regarding coarticulatory effects, a study by Watkins (2006) analysed phonological environments in the production of English weak forms by advanced Brazilian Speakers of English.In this case, contextual factors such as the existence of a preceding syllable in the same intonation group, the initial segment of the following word or the metrical status of the following syllable exerted a significant effect together but failed to account well for variability on their own.Watkins suggests that given these findings, psycholinguistic factors should also be borne in mind when analysing non-native production of English weak forms.
There have also been studies about the effect of treatment on the acquisition of vowel reduction.Gutierrez and Monroy (2003) tested a similar age group (15-17) with the same L1 (Spanish) as the present study on the same type of vowel reduction ('phonological' as in private or suspect).They concluded that there was no improvement in the production of vowel reduction by students after a treatment of 7 hours distributed across 14 weeks.While Gutierrez and Monroy suggest that the training period was too short to account for the lack of treatment effects, these results could be due to other factors such as type of training used or learner motivation.
The studies mentioned above examined production of vowel reduction by non-native speakers.Regarding perception of vowel reduction, Gómez Lacabex and García Lecumberri (2005) analysed the perception of the contrast full vowel-reduced vowel in content words by Spanish students of English as a foreign language with native and non-native exposure and no specific training.Results revealed that the subjects were only able to perceive vowel reduction at chance performance levels with 56% of correct answers while they reported better results in the case of full-vowel perception (67% correct answers).
The basic division between production and perception turns out to be more complex than would be expected (Llisterri, 1995;Leather, 1999;Koerich, 2006).This complexity is reflected in the diverse and inconclusive findings that can be found in the literature.Some researchers suggest that perception and production are independent of each other (Fujisaki, 1983;Browselow & Park, 1995;Paliwal, Lindsay & Ainsworth, 1983).More numerous, however, are those studies that support a dependency relationship between perception and production (Ladefoged, 1967;Neufeld, 1980;Flege, 1987;Leather, 1990;Koerich 2006).Most of these report a dependence of production on perception since correct perception is understood to be a prerequisite for production, as authors such as Neufeld (1980) or Flege (1987) in his 'equivalence classification' parameter have proposed.An alternative is to view perception-production as a 'parallel' relationship.Some researchers have proposed an alignment between both abilities (Koerich, 2006;Rauber, Escudero, Bion and Baptista, 2005) while others have reported that they do not always progress in parallel (Bohn & Flege, 1990) or that development applies to some phenomena only (Boatman, 1990).These inconclusive findings are most likely due to the great methodological diversity present in the field.For instance, there is diversity across subject groups, as the studies may test native speakers, non-native speakers or bilinguals.There is also diversity in the contexts in which the studies have been carried out (formal settings vs. naturalistic environments).Additionally, the stimuli have also been varied: synthetic speech, words, unscripted speech and reading corpora (words, sentences, texts) amongst others.
Inconsistent findings may also be found in studies which have explored the relationship between perception and production in training.A close connection between production and perception is supported by those studies which have found that training in one of the skills exerts positive influence on the other.An example is the classical study by Catford and Pisoni (1970) who, having administered two different types of phonetic training (auditory and articulatory) on exotic sounds to English speakers, concluded that the group which underwent articulatory instruction was significantly superior at perceiving those sounds than the group receiving discrimination training.Mathews (1997) focused on testing the perception of Japanese learners of English who had been given training with an articulatory focus.He concluded that "explicit articulatory instruction in the pronunciation of non-native segments can contribute to the development of novel segmental categories" since his subjects improved in the discrimination of contrasts such as /bv/ or /Tf/ after the treatment (p.227).A further study conducted by Leather (1990) is relevant in this context since it involved the analysis of the effect of training in Chinese lexical tones on production and perception.While one experimental group received computer-based perception training, the other group was trained to use computer-managed visual feedback in their production.The groups were tested in both perceptual and production abilities and results suggested that both training types had positive effects since training in one modality "tended to be sufficient to enable a learner to perform in the other" (1990, p.95).
Our study analyses the relationship between perception and production training in FL (foreign language) formal instruction (FI) contexts.In Spain, while new approaches are being explored in novel teaching methods such as 'content and language integrated learning' (CLIL), which involves using the target language as a language of instruction (Gallardo et al., in press), FLs are mainly taught formally and there is little exposure outside the teaching environment, which characterizes what are known as 'formal instruction settings' as opposed to 'natural settings'.Many studies have reported the necessity to include pronunciation training in formal instruction curricula.Setter and Jenkins (2005) support this view in a recent review article which highlights the need to activate pronunciation as part of communication and discourse in current teaching methodologies.There are also many studies that have empirically demonstrated the effectiveness of pronunciation training (Adams, 1979;McCandles & Winitz, 1986;Elliott, 1995;Rochet, 1995;Matthews, 1997;Derwing, Munro, Wiebe, 1998;Cortés Pomacóndor, 1999;Arteaga, 2000).Our study follows this line of research by comparing the effect of specific training on a new phonemic category (schwa) to the acquisition obtained by mere exposure.
In sound training research, generalization has often been used as a measure of the robustness of learning (Lively. Pisoni, Yamada, Tohkura, & Yamada, 1994;Hardison, 2004).Studies like those of Rochet (1995) andMcClaskey Pisoni &Carrell, (1983) concluded that generalization after a training period can occur.Specifically, Rochet found that his adult subjects (native speakers of Mandarin Chinese) were able to transfer voice onset time (VOT) values across consonant place of articulation (bilabial plosives /b, p/ to velar plosives /k, g/ and dental plosives /d, t/) and vocalic contexts (from /u/ to /i/ and / a/) after perceptual training with natural speech tokens.McClaskey and colleagues reported similar findings with synthetic stimuli.Their subjects (monolingual speakers of English) showed significant gains after laboratory discrimination training on voice onset time values in places or articulation which had not been used in the treatment sessions.
Our study also aimed at investigating whether the two types of training provided (perceptual and articulatory) induced robust acquisition.
To sum up, the aims of this study are (i) to investigate the ability to identify the English contrast full vowel versus the mid-central vowel (schwa, a 'new' sound in terms of L1 categories) by groups of Spanish listeners after specific instruction; (ii)

Subjects
A group of 50 Spanish teenagers was selected to take part in the study.The cohort was divided into two experimental groups and one control group.All groups suffered attrition from the pre to the posttest.Attrition most severely affected the control group, which was originally a 12 student group and was reduced to 7. After attrition, the original total of 50 was reduced to 41 members (24 female, 17 male, mean age: 15.8) across the three groups: experimental group A (17 subjects) received training based on perceptual cues, experimental group B (17 subjects) underwent vowel reduction training based on production cues and the control group (7 subjects) was not given any specific training.The subjects were learning English as a foreign language (FL) in a formal instruction setting.They attended English lessons 3 hours a week at a private language school (where the training took place) in addition to English instruction at school.This was an important factor as they received exposure to native English accents at the private language school but not at the state school.They had been studying at the private language school for 4.1 years on average.Motivation was also monitored in this study; subjects in all groups showed high motivation rates (80% on average, as estimated by specific questionnaires).

Training procedure
The training period was carried out over 3 months.Students received specific training on vowel reduction once a week (12 sessions on average) within their English course (pre-first certificate level).The administration of the training was organised as follows: 3 sessions on awareness raising; 4 practical sessions on vowel reduction in lexical words and in grammatical words (weak forms) and 3-4 review sessions.Training sessions lasted 20 to 25 minutes in the case of the group receiving perception training while they were somewhat longer (35 to 40 minutes) in the case of the group receiving production training, since it involved providing individual feedback to each student.
Perceptual training was based on discrimination exercises, and production on the part of the students was not encouraged.Production training, on the other hand, provided students with articulatory and visual cues and was based on production of the items on the part of the students and individual feedback delivered by the trainer.Here, perceptual activities were avoided and model productions were limited in the classroom.Due to the nature of the training and the context in which it was carried out, it should be noted that while production was not encouraged in the perception sessions, it could not be completely absent in the whole English course just as perception could not be totally absent in the production sessions and English course.Thus these training periods can be described as sessions in which perception was maximized while production was minimized and vice versa.

Stimuli and procedures
All subjects were tested in a quiet laboratory after having received parental consent for taking part in the study.The test was administered by a computer program running on several PCs simultaneously.Instructions were facilitated in an adjacent room prior to test administration.These described the set up of the test, the working of the computer program and the testing sequence.All subjects were tested in the same order: identification of words in isolation followed by identification of words in sentences.Stimuli presentation was randomized for each subject.A total of 22 items were presented to the subjects in a two-alternative-forced-choice identification task with orthographic input using a custom designed MATLAB program.This corpus consisted of 11 minimal word pairs (see appendix 1) with the contrast 'full vowel' versus 'schwa' in the unstressed syllable (e.g: seafood-Seaford).Prior to testing, students were familiarized with the stimuli so that they would know their meaning.These minimal word pair sets were also tested in the carrier sentence: I'm going to say … again.Thus, vowel perception was tested in two different contexts: words and sentences.

Results
Section 3.1 provides data for words with schwa in unstressed syllables in isolation and in sentences.Section 3.2 details data obtained from words with a full vowel in the unstressed syllables in the two contexts analysed.
3.1.Words with schwa in the unstressed syllable Students' perceptions were coded as percentages of correct answers.Due to differences in group sizes, Wilcoxon non-parametric analyses were computed for intra-group data and U-Mann Whitney non-parametric tests were applied for the inter-group analyses.Results compared pre-test and posttest training conditions and gain scores of the two experimental groups taken together (A&B) vs. the control group (C) on the one hand, and of perceptually trained group (A) vs. articulatorily trained group (B) on the other hand.
Inter-group analysis showed no significant differences between the experimental groups and the control group in the pre-test neither in words or sentences.Near significant differences were found between A&B and C in the gain score for words (p = .069).In the case of identification in sentences, posttest differences were significantly higher for A&B (z = -1.99,p < .05)and gain scores were near-significantly higher (p = .070).
Intra-group analysis for experimental groups A&B analysed together revealed significant differences between pre-test and posttest in identification of schwa in words (z = -3.59,p < .001)and sentences (z = -2.93,p < .05)(figure 1 left and right respectively).No significant pre/post differences for control group C in words or in sentences were found.
A further inter-task analysis revealed that there were no differences between identification in words and in sentences across groups and treatment stages.Further non-parametric tests compared the experimental group receiving perception training (A) with the experimental group receiving production training (B).While both A and B showed an improvement in mean scores between pre-test and posttest in words and sentences (see figure 2), this improvement was only significant in the case of group B in words (z = -3.30,p < .005)and sentences (z = -2.59,p <.05).However, inter-group analysis showed no significant differences between groups A and B in either words or sentences, either in the pre-test (words: p = .79and sentences: p = .48)or in the posttest (words: p = .88,sentences: p = .92).As reported above, further intertask analyses revealed that there were no differences between identification in words and sentences between experimental groups A and B neither in the pre-tests nor in the posttest.

Words with a full vowel in unstressed syllable
A procedure similar to that described in 3.1 was followed in the case of those words having a full vowel in the unstressed syllable.Inter-group analysis revealed no significant differences between A&B on the one hand and C on the other neither in the pre-test nor in the posttest in words or sentences.
No significant differences were found in the intra-group analysis of experimental groups A&B analysed together between pre-test and posttest for words or sentences.Comparisons of the pre-test and posttest showed no significant differences for control group C neither in words nor in sentences (figure 3).As in the case of the words in 3.1, inter-task analysis revealed no differences between vowel identifications in words and sentences; this was found to be consistent across groups and treatment stages.When looking at how experimental groups performed separately, no significant differences between pre-test and posttest were found for either group in words or sentences.Inter-group analysis showed no differences between A and B neither in the pre-test nor in the posttest in words or sentences.
Finally, to complete these analyses we compared the perception of the words with a full vowel in the unstressed syllable and the words with schwa in the unstressed syllable.In all cases (pre-and posttest, word and sentence contexts) all groups had significantly higher correct scores in the strong vowel words than in the weak vowel words in pretest (p < .0001)and posttest .That is to say, subjects were always better at identifying strong vowels than weak vowels.The subjects analysed in this study were able to improve their identification of schwa in words after specific training was administered.Moreover, the control group did not show improvement, indicating that the exposure and language experience gained during the testing period was not responsible for the advantage displayed by the experimental groups.This study agrees with previous research which report positive effects of pronunciation training (McCandles & Winitz, 1986;Rochet, 1995;Cenoz & García Lecumberri, 1999).Among the factors which might explain these results, the specific focus of training administered and the relevance given on awareness raising are worthy of consideration.The phonetic treatment given in this study was specifically focused on vowel reduction; no other pronunciation aspects (e.g.: other English vowels, consonants etc.) were included in the treatment, an aspect which differentiates this study from that of Gutierrez and Monroy (2003).Furthermore, both segmental and suprasegmental levels were worked on since word stress, sentence stress and rhythm were addressed in the sessions, especially in the ones during which weak forms were practiced, as suggested by authors such as Derwing, Munro and Wiebe (1998).The insertion of sessions on raising awareness on the sound under focus may have also played a positive effect as indicated by authors such as Leow (2000) and Rosa and O'Neil (1999).
Results support current theoretical views regarding the possibility of developing perceptual abilities when identifying non-native sounds (Flege, 1995;Best, 2001) since subjects improved their identification of schwa.However, this improvement was not as high as predicted for 'new' sounds in models such as SLM or PAM.Our results suggest that factors such as orthography contribute to 'new' sounds -here the reduced vowel -not being identified at high rates, since the lack of sound-letter correspondence for schwa results in mixed identification scores.
The improvement reported in our results was obtained after two different types of training.However, results revealed a marginally more significant treatment effect for group B. This group underwent production training based on articulatory and visual cues and feedback on individual production while perception was minimized in the training sessions; still it showed significant improvement in the identification of the reduced vowel in both words and sentences.This suggests that production training exerts a positive influence on perceptual abilities.Our study supports the facilitating view between perception and production training discussed in section 1 above in one direction: production training exerts a positive influence on perceptual skills.Further analysis of the students' productions would reveal whether perception training also exerts a positive influence on production abilities.
A possible reason why the treatments did not produce greater different effects may lie in the context in which the treatment took place, which did not completely isolate perception and production practice.Other studies which have analysed auditory versus articulatory training effects have done so in short -term laboratory training conditions.Catford and Pisoni (1970) provided training on exotic sounds to English speakers and Leather (1990) provided training on Chinese tones to Dutch speakers.Although Catford and Pisoni noted the difficulty of isolating production training from perception, this was fairly controlled in these two cases as they were isolated experiments, not framed within a language learning course.In the case of our study, the training was inserted within an EFL course, which provided students with the possibility of practising both perception and production in other sessions than the training one.That is to say, a student might be trained on a specific sound auditorily during the training session and decide to focus or try to practice this sound in a non-specific speaking task demanded in the English course the next day.Likewise, a student receiving articulatory training and feedback one day may decide to listen carefully to his/her teacher's pronunciation of that particular sound in subsequent lessons.
There were no significant differences between pre-test and posttest for the identification of full vowels in unstressed syllables.Although there was a tendency for the group with perceptual training to improve more, given the size of the group and the variance, it did not reach significance.One reason why strong vowel perception showed little difference between pre and posttests could be the high scores obtained from the beginning, which for non-native listeners could be regarded as near ceiling effects for the perception of these sounds.Another factor which could have accounted for the lack of improvement in the perception of this set of sounds could have been the training routine and awareness sessions administered to the students, who were instructed particularly on vowel reduction.
Significant differences between the identification of full vowels and schwa suggest that students were able to discriminate full vowels better than weak vowels at both stages of the experiment.Orthographical influences are one of the likely reasons for this advantage since Spanish speakers will tend to think orthographic vowels correspond to full vowels and therefore there is a natural bias towards full vowel reporting.
Generalization of identification of schwa to a non-instructed and new context (sentences) occurred in our study, indicating a degree of robustness in the acquisition of the new sound.Subjects were not trained in perceiving schwa within a sentence, a more complex and natural context.However, they were able to improve significantly their identification abilities in this context after the treatment.Our results are in line with other studies mentioned earlier which have shown the ability to generalize to non-instructed contexts (McClaskey et al., 1983;Rochet, 1995).Further research into generalization of identification abilities in vowel reduction to novel or non-instructed words will contribute to give a better picture of the degree of robustness achieved.The present study has shown that specific training on English vowel reduction to Spanish learners of English as a foreign language with native English exposure exerted a positive effect on the students' ability to identify a reduced vowel (schwa).
Our results found few differences in perceptual performance between auditory and articulatory trained groups.Two possible interpretations have been given in order to account for these findings.On the one hand, we have noted the difficulty of isolating perception experience from production training and vice-versa when setting training sessions in FI settings.On the other hand, the fact that articulatory training has contributed to the development of students' perceptual abilities, supports the interconnectedness and facilitating relationship between perception and production.This experiment also revealed that perceptual abilities carried over to a non-instructed context, indicating a degree of robustness in learning.

Figure 1 :
Figure 1: Identification of reduced vowels in words (left) and sentences (right) contrasting experimental groups together (A&B) and control group (C).

Figure 2 :
Figure 2: Identification of vowel reduction in words (left) and sentences (right) contrasting experimental groups separately.

Figure 3 :
Figure 3: Identification of full vowel in words (left) and sentences (right) contrasting experimental groups together (A&B) and control group (C).

Figure 4 :
Figure 4: Identification of full vowel in words (left) and sentences (right) contrasting experimental groups separately.