THE INFLUENCE OF METALINGUISTIC KNOWLEDGE OF SEGMENTAL PHONOLOGY ON THE PRODUCTION OF ENGLISH VOWELS BY BRAZILIAN UNDERGRADUATE STUDENTS

his article presents data on the production of English vowels [i ɪ ɛ æ u ʊ] by Brazilian English Language Teaching (ELT) undergraduate students before and ater taking a course on English Segmental Phonology. Brazilian learners tend to assimilate the contrasts present in [i ɪ], [ɛ æ] and [u ʊ] into the prototypical categories of Brazilian Portuguese [i], [ɛ] and [u], respectively. hus, this article investigates the inluence of receiving explicit metalinguistic instruction of English segmental phonology on the production of the target pairs of vowels. he data analysis is of acoustic nature (spectral quality), and the results show that some learners created new phonetic categories for the English vowels ater receiving the metalinguistic instruction.


Introduction
Having a minimum control of the pronunciation of a second/foreign language (L2) is required for oral communication.Accurate production of both segments and prosodic elements are necessary for intelligibility (e.g.Celce-Murcia, Brinton, & Goodwin, 2010;Morley, 1994;Pennington, 1996Pennington, , 1998)), which, roughly explained, refers to how much your interlocutor understands of your speech; and for comprehensibility (Parrino, 1998;Singleton & Ryan, 2004), which is related to the efort your interlocutor needs to employ in order to understand your speech.herefore, it goes without saying that pronunciation instruction, even though very oten neglected by language teachers for various reasons, should be a constant part of L2 pedagogy (Silveira, 2004;Tomlinson, 2005;Yule & Macdonald, 1994).
As a result, non-native English speaking undergraduate students of English Language Teaching (ELT) should be especially interested in improving age, but because people learn the phonological system of their L1 so well.When acquiring their L1, one needs to learn how to accommodate the variation inherent to the acoustic signal into prototypical phonological categories of their L1 so that communication can take place, and the brain does so by taking statistics of the input and assigning exemplars to the corresponding categories (Bybee, 2003;Cristófaro Silva, 2003;P. Kuhl, 1991; P. K. Kuhl, 1993; P. K. Kuhl et al., 2008;Leather, 2003;Pierrehumbert, 2001Pierrehumbert, , 2003)).
herefore, identifying and, consequently, producing L2 sounds that are acoustically very close to a sound of the L1 become more challenging.he L2 sounds that are phonetically closer to sounds of the L1 are the hardest for L2 learners to perceive and produce, since they tend not to (initially) perceive them as diferent, assimilating them to the prototypical phonological categories of their L1 (Flege, 1995(Flege, , 1999(Flege, , 2007)).his is the case with English vowels [i ɪ], [ɛ ae] and [u ʊ], which tend to be assimilated by Brazilian learners into the prototypical categories of Brazilian Portuguese [i], [ɛ] and [u], respectively (Bion, Escudero, Rauber, & Baptista, 2006;Lima Jr, 2015;Nobre-Oliveira, 2007;Rauber, 2006).hat is the reason those six vowels are the ones in focus in this article.
Assuming that the process of L2 acquisition is a complex dynamic system (e.g.De Bot, 2008;De Bot, Lowie, & Verspoor, 2007;Larsen-Freeman, 1997;Lima Jr, 2013), the prototypical categories created for communication in the L1 act as attractor states for the L2.Attractors are states of temporary accommodation of a complex dynamic system; it is where the system inds temporary stability amidst chaos. he very fact that attractor states are temporary reinforces the dynamic nature of such system, meaning that the system is constantly moving from one attractor state to another.Some attractor states require more energy for the system to move away from, but they are all potentially temporary in nature.
his means that Language Acquisition would be more accurately described as Language Development, 1 due to its dynamic, never-ending change in time as the system moves through diferent attractor states.Even for the process of irst language acquisition, it is impossible to pinpoint when one has inally acquired the entire system, for people are constantly learning new words, new (idiomatic, technical, slang) expressions and new pragmatic uses of language (Singleton & Ryan, 2004).With L2 development, the ongoing change of the system is even more evident, for learners will always have room for improvement in their luency, accuracy, proiciency, competence, intelligibility, comprehensibility, etc.
As mentioned above, the prototypical phonological categories of the L1 may act as attractor states for the L2 developing system, causing Brazilian learners to perceive (and thus produce) only [i] when exposed to [i] or [ɪ], for example.herefore, one of the purposes of the language classroom is to help learners move their systems away from these attractor states into the ones that have the appropriate distinctions between L2 contrasting sounds.Some learners need more intervention to have their systems exit an attractor state than others, but they can all potentially do so.
his article hypothesizes that for some Brazilian undergraduate learners of English, the English Phonology course may function as an intervention strong enough to help them move their developing L2 vowels systems into a state with the appropriate contrasting categories.It is also expected, though, that the course will not be enough for some learners to create new vowel categories for the L2, at least not immediately.Because of the non-linear relation between perturbation (intervention -phonology classes) and movement of the system (creation of new vowel categories), it is possible that the efect of the lessons will appear later on for some learners.
his dynamic nature of L2 phonological development is what makes dynamic systems better examined in a longitudinal study rather than in a crosssectional one (De Bot & Larsen-Freeman, 2011;Larsen-Freeman & Cameron, 2008;Lima Jr, 2016a;Verspoor, De Bot, & Lowie, 2011).Also, in a dynamic system, the processes are more relevant than the products, since its ongoing change in time makes it impossible for the system to reach an end state.Hence, concepts such as inal state or ultimate attainment should be replaced by concepts of dynamic idiosyncratic development.
It is for those reasons that the data presented in this article come from an umbrella longitudinal project which looks into the phonological development of English-L2 learners individually.he participants, 13 Brazilian undergraduate learners of English Language Teaching (ELT), have been recorded every semester since they were admitted to college, and will keep being recorded every semester until they graduate.he data presented here zoom in at the vowel production of the second and third recordings, collected right before and right ater students took the mandatory English Segmental Phonology course in their third term.
he main goal of the umbrella project is to investigate the individual routes of English phonological development by these Brazilian undergraduate students.Within this general goal, this article has the more speciic objective of analyzing the individual creation of new vocalic categories, in terms of spectral quality (F1-F2), ater having received explicit metalinguistic instruction on English segmental phonology.As a result, the main goal of this article is to discuss the inluence of metalinguistic knowledge of segmental phonology on the production of English vowels by Brazilian undergraduate students of English.

Method
he data come from 13 Brazilian undergraduate students majoring in English Language Teaching (ELT) at a federal university in the state of Ceará, Brazil.As part of the umbrella longitudinal project under which this study lies, the students started being recorded every semester since their admission to college and will be recorded every semester until they graduate.his article presents data from the second and third recordings, which were done before and ater their third term, when they take a mandatory course on English Segmental Phonology.
he phonology course is 64 hours long, lasting one entire school semester, and is taught in English.It is a technical and metalinguistic course, but with a secondary goal of helping the non-native English speaking teacher students to improve their pronunciation of English vowels and consonants. he course begins with the basics of articulatory phonetics (8 hours) and the principles of the International Phonetic Alphabet -IPA (4 hours).he remainder is equally divided into the study of English consonants and English vowels.Students learn how to transcribe words using the IPA and to read transcribed words.hey learn how to classify consonants into place of articulation, manner of articulation and voicing; and the vowels into tongue height, tongue advancement, lip position and muscle tension (tense vs lax).hey also study the relations between orthography and phonology, and, towards the end of the course, relect on how to teach the sounds of English consonants and vowels to Brazilians.
he undergraduate students were recorded individually, in a silent room, reading words inserted in the carrier sentence "I said token this time".he corpus was composed of three words for each target vowel.he words were controlled for phonological context -all of them were monosyllabic with a CVC context, where both Cs were voiceless plosives. 2his control was meant to prevent acoustic bias from the neighboring segments, and it also made it easier to identify, segment and label the target vowels in PRAAT (Boersma & Weenink, 2011), the sotware used to conduct the acoustic analyses.he words are presented in Table 1. he words were presented to participants in the carrier sentence "I said token this time" (Watkins & Rauber, 2010), which controls for the number of syllables before and ater the target word, preventing, thus, intonational bias from the beginning and end of sentences read as of a list. he sentences were presented in a slides presentation, with each sentence on an individual slide.Each word was randomly repeated four times, generating 12 tokens per vowel per participant, which generated 72 tokens per participant, and a total of 936 vowels per semester.In the end, a total of 1,872 vowels were identiied and analyzed.
he recordings were done with a supercardioid Shure 150B lapel microphone connected to a Zoom 4HnSP recorder.he audio was captured in mono, with a sampling rate of 44 KHz, and saved in wav format.
he vowels were segmented in PRAAT (Boersma & Weenink, 2011). he points considered as beginning and end of each vowel were the irst and last valley in the periodic pulse in the waveform which had considerable amplitude, resembled the vocalic period, and presented stable formants in the spectrogram.
One of the most common methods used to extract formants is the Linear Predictive Coding (LPC), which is a predictive algorithm that decomposes the acoustic signal and makes an estimation of the resonances generated in the vocal tract.However, the automatic LPC analysis has been criticized (e. g.Vallabha & Tuller, 2002;Wempe & Boersma, 2003) because it may introduce systematic errors in the formant extraction depending on the parameters set beforehand by the researcher. he problem is that, with the automatic LPC analysis, the researcher needs to deine, before the analysis, the order of the LPC (i.e., the quantity of formants to be found) and the maximum (ceiling) frequency in which to look, which is usually set as 5 KHz for men and 5.5 KHz for women.Nevertheless, diferent men and women might have diferent frequency ceilings, which, if not set accordingly, might lead the LPC into identifying peaks that do not exist and overlooking peaks that do.
A solution to this problem is to double-check the adjustment of the LPC to the FFT spectrum (obtained by the Fast Fourier Transform algorithm) vowel by vowel.Even though this method is more time-consuming, it allows the researcher to adjust, when necessary, the ceiling frequency or the order of the LPC for speciic speakers.his is what the scripts used to extract F1 and F2 in this study do (Arantes, 2010(Arantes, , 2011)).
Ater extracted, the F1 and F2 values were used to create vowel space plots using the package PhonR (McCloy, 2016) for the sotware R (R Core Team, 2016). he same package was used to normalize the formant values using the Lobanov method, which creates a z-score for F1 and F2.his was done in order to later calculate the Euclidean Distances between the vowels without the bias of F2 values, which have raw values that are much larger and that increase in much larger increments than F1.Finally, t-tests were conducted with F1 and F2 values of target pairs of vowels.

Results
he irst step in the data analysis was to visually inspect the individual vowel spaces, comparing the distributions of speakers' vowels in the second and third recordings.Since the plots present, for each vowel, all the tokens produced by the speaker, the point of intersection between the F1 and F2 means and an ellipsis with one standard deviation, it was relatively easy to visually identify vowels that overlapped and vowels that were produced separately.
In this irst visual inspection of plots, when two vowels had half or more of their one-standard-deviation ellipses overlapping, they were considered overlapping vowels (i.e., only one vocalic category for both); when less than half of the ellipses overlapped or when they did not overlap at all, they were considered separate vowel categories.As will be shown later, two other more quantitative methods were used to ratify this somewhat qualitative classiication.
As an example, the image below features the plots of two learners in the second recording (before the phonology classes), one of whom (speaker A) had two separate categories for the vowels [i ɪ], but had overlapping vowels for the other two pairs; and the other student (speaker D) had overlapping vowels for all three pairs.To avoid the risk of losing information when using phonetic symbols in diferent computer programs (excel, notepad, PRAAT and R), Well's (1982) keywords for the English language were used instead.herefore, where there is fleece, kit, dress, trap, goose and foot in the plots throughout the article, please read [i ɪ ɛ ae u ʊ], respectively.As can be seen, speaker A had separate vowel categories for [i ɪ] before he took the phonology course.Besides speaker A, other 7 speakers (A, B, F, G, K, L and N) had separate categories for the high front vowels before taking the phonology course.In relation to the pair [u ʊ], three learners (G, I and J) had separate vowel categories before the phonology course.Notice that speaker G is also listed as having the [i ɪ] contrast in this recording, making him the only learner with two contrasts before the phonology course.His vowel space can be seen in Figure 2. In the recording done before the phonology course, no distinction was produced in the [ɛ ae] contrast.he results presented up to this point already reveal a hierarchy of diiculty for Brazilian learners of English concerning these three pairs of vowels, with [ɛ ae] being the hardest and [i ɪ] the least diicult.his result is similar to those found in Lima Jr (2015), Barboza (2008) and Rauber (2006).
Ater the course on English Phonology, the learners were recorded again and their vowel spaces were created and visually analyzed in the same manner, but this time comparing their productions before and ater the lessons in order to look into the efects of the explicit instruction.Besides the individual vowels spaces, a third plot was created for each speaker, with the tokens and the mean points for all vowels colorcoded by recording.An example of this plot, from speaker A's productions, can be seen in Figure 3, along with the corresponding separate vowel spaces.As can be seen, this new plot with data from the two recordings contains Lobanov-normalized vowels.his was done in order to decrease possible efects from the diferent elocutions the same speaker might have in recordings done with a 6-month interval, as well as to have both F1 and F2 in the same scale for further calculation of Euclidean Distances without the bias F2 values, which inherently increase in a much higher increment than F1.
It can also be seen in Figure 3 that this particular learner (speaker A) created new categories for both [u ʊ] and [ɛ ae] while taking the course on segmental phonology.Since he already had the [i ɪ] contrast before the course, he ended the semester with contrasts in all three pairs.he third plot clearly shows how close his foot-goose and dress-trap means were in recording 2 (before the phonology course) and how distant they became in recording 3 (ater the course).
Besides speaker A, two more speakers created new categories for [u ʊ] (E and N), and two more were now able to produce the [ɛ ae] contrast (F and N).Also, two learners created new [i ɪ] categories (I and J).Table 2 presents a summary of the presence of vowel contrasts in recording 2 (before the phonology course) and the creation of new vowel contrasts in recording 3 (ater the lessons on phonology). he presence of a contrast is signaled with the word YES, and the creation of new categories in the third recording, besides having the word YES, is also highlighted.here was a total of 10 contrasts in recording 2, and other 8 were created ater the phonology course.his allowed for two learners (A and N) to end their third term with six well-deined categories for these six English vowels that are challenging for Brazilians.In the third recording, the hierarchy of diiculty found in the previous recording was maintained, with 9 students producing the [i ɪ] contrast, 5 producing the [u ʊ] contrast, and 3 producing [ɛ ae] separately.
On the negative side, 8 students made absolutely no progress from one recording to another, of which a total of 5 (D, E, I, M and O) had all their pairs of vowels overlapping in both recordings.Also, one learner (speaker I) made a [u ʊ] contrast in the second recording (before the phonology lessons), which disappeared in the third recording.
Two mathematical instruments were used in order to ratify the conclusions reached so far. he irst one was the measurement of the Euclidean Distances between the mean points of contrasting vowels for each speaker.he Euclidean Distance is a measure of dissimilarity used to compare two or more items given a number of quantiiable characteristics.When the number of quantiiable characteristics is two, it can be used to measure the distance between two points in a cartesian coordinate system, which is the case of the F1-F2 graph.Its formula is derived from the Pythagorean theorem, and, taking the [ɛ ae] contrast as an example, it was calculated as: As mentioned in the method section, the formant values for each speaker were previously Lobanov-normalized so that the Euclidean Distances did not include the bias of the F2 values, which are inherently larger and increase in larger increments than those of F1.
Figure 4 presents speaker A's vowel spaces for both recordings, with a visual representation of the concept of Euclidean Distances on a cartesian coordinate system. he idea was to see if the distances between the contrasting vowels [i ɪ], [u ʊ] and [ɛ ae] increase from one recording to the next.As can be seen, speaker A maintained the exact same distance between [i ɪ], which was his largest distance, in both recordings (1.14).His [u ʊ] distance increased from 0.12 to 0.97 in the recording ater the phonology lessons, and his [ɛ ae] distance increased from .0.10 to 0.59.
he following table presents the Euclidean Distances for all vowel contrasts for all speakers in both recordings.

Speaker Recording
Vowel Contrast In the preceding table, all vowel contrasts present in recording 2 and kept in recording 3 were highlighted in yellow, and all the contrasts created only in recording 3 were highlighted in red.Notice that all contrasts, either present and kept or newly created, have Euclidean Distances of at least 0.5.Also, in the contrasts created in recording 3, the diference between the Euclidean Distances is of at least .30.
In Table 3 there are very few distances higher than .5 that were not considered contrastive because, even though the mean points are slightly apart, the standard deviations are very high, causing the ellipses to overlap, which shows that there is no phonological contrast.For instance, this was the case of speaker L's productions of [u ʊ] in recording 3, whose vowel space is presented in Figure 5. Notice that foot and goose are somewhat apart, but with overlapping ellipses.he results from the t-tests reveal that all the pairs of vowels considered contrastive with the visual inspection of the vowel spaces and the calculation of Euclidean Distances had a signiicant p-value (with alpha at 5%), showing that the two contrasting vowels were also statistically diferent.he only exception was with the [i ɪ] contrast of speaker F in recording 3, whose p value is 0.89.his was the only case in which the t-test conducted with F2 values was useful to explain a contrasting pair of vowels.Looking at this speaker's vowel plot (Figure 6), it is clear that, in the third recording, he separated the two vowels much more in terms of tongue advancement than tongue height.he p-value of the t-test conducted with the F2 values was 0.00.Referring back to the results of the t-tests in Table 4, one can notice that there are a few signiicant p-values for pairs of vowels that were not considered contrastive.his was the same case with the Euclidean Distances higher than 0.5 not considered contrastive; that is, it happened when the means of target vowels were somewhat separate, and picked by the statistical test, but the standard deviations were so high that the ellipses in the vowel spaces overlapped at least 50%.An example of such situation is the production of [i ɪ] in recording 2 and of [u ʊ] in recording 3 by speaker D, both with p-values of 0.00 yet clearly overlapped in the vowel spaces (Figure 7).Using the information from the visual inspection of individual vowel spaces together with the Euclidean Distances and the results from the t-tests, it was possible to verify the contrasts present in recording 2 (before the phonology course) and kept in recording 3 (ater the phonology course) as well as those created only in recording 3, possibly due to the efects of receiving explicit metalinguistic instruction on English Segmental Phonology, as will be further discussed in the following section.

Discussion
he results irst showed that a few learners already had some vowel contrasts before taking the course on English Segmental Phonology.his was expected for two main reasons.First, Brazilian undergraduates of English Language Teaching (ELT) are admitted to college without taking an English proiciency test.his means that their proiciency levels vary a lot, especially in initial terms, which accounts for the presence of some vowel contrasts in the speech of the more advanced learners.However, not even the more advanced learners produced the contrasts for all three pairs of vowels.Before taking the phonology course, from the 13 students recorded, 8 had contrasting pairs of vowels, of which only one had contrasts for two pairs.he second reason for such expectation is that there is a lot of individual variability among learners of English in a non-native English speaking country, such as Brazil.English is a mandatory subject in the four years of Middle School 4 and in the three years of High School. 5However, English instruction in regular school emphasizes writing skills, and students develop diferent levels of interest in developing their oral skills on their own.Also, many private schools in Brazil ofer English classes in the ive years of Elementary School, 6 and many teenagers take English classes in extracurricular language institutes, where there is usually more emphasis on oral communication.All these diferences in amount of instruction before college education would already account for such variability, not to mention that diferent teenagers get exposed to English in diferent amounts through media (music, movies, TV series), vlogs, YouTube channels, video games, computer programs, podcasts, etc.
From a Dynamic Systems heory (DST) perspective, the fact that each student in class is at a diferent developmental stage is taken for granted.Even if they take placement tests before starting a course, diferent students will never be at the exact same point in L2 development.Teachers need to acknowledge, and keep reminding themselves, that each language student is a dynamic system undergoing a process of language learning which is also a dynamic system.Each system is made up of so many elements, whose interaction among themselves and with the environment make the performance in the L2 emerge, that it is impossible to expect all students to be at the same initial stage in the beginning of a course.
In addition, due to the non-linear relation between cause and efect in a dynamic system, language instructors should not expect learners to react the same way to teaching interventions.In the case of this study, it was expected that each student would react Likewise, one might think that all speakers registered with contrasting vowels in the results section produced the contrasts equally well.However, some learners produced contrasting vowels in the threshold of the criteria established, whereas others produced vowels truly separated, with the ellipses far from touching each other.As a matter of fact, the data present variation even within the same speaker.Speakers F, G and K, for instance, are all marked with separate categories for [i] and [ɪ] in the second and third recordings.While this is true, this simple categorization overlooks the fact that their contrasts in the third recording were much higher than in the previous one.his is shown in Figure 9, comparing the [i ɪ] contrasts of speaker G, as an example.As can be seen, even though he has [i ɪ] contrasts in both recordings, the one in the latter is much clearer, showing that he has made progress with vowel categories that were already somewhat resolved.A data analysis that looks exclusively at group means and/or looks exclusively at classiications is unable to register and account for the nuances of the individual routes of L2 development, which can be very informative concerning the process of language acquisition.

Conclusion
he main goal of this article was to discuss the inluence of explicit metalinguistic instruction of segmental phonology on the production of six English vowels by Brazilian undergraduate students of English Language Teaching.his goal was achieved by analyzing the individual creation of new vocalic categories, in terms of spectral quality (F1-F2), ater having taken an undergraduate-level course on English Segmental Phonology.With the recording done ater

Figure 1 :
Figure 1: Vowels spaces with and without contrast for the vowels [i ɪ]

Figure 2 :
Figure 2: Speaker's G vowels space before the phonology course

Figure 3 :
Figure 3: Plots comparing speaker A's recordings 2 and 3 (before and ater the phonology course)

Figure 4 :
Figure 4: Speaker A's vowel spaces with Euclidean Distances for contrasting vowels

Figure 5 :
Figure 5: Speaker L's vowel space in recording 3, illustrating a slight distance between [u] and [ʊ], but with large ellipses (one standard deviation) that overlap

Figure 6 :
Figure 6: Speaker F's vowel space in the third recording, showing that his [i ɪ] contrast was much higher in F2 and F1

Figure 7 :
Figure 7: Speaker D's vowels spaces for recordings 2 and 3, to illustrate that t-tests with formant values not always correspond to distinctive vowels

Figure 8 :
Figure 8: Speaker K's vowel spaces for recordings 2 and 3, showing a slight, yet not satisfactory movement in the [ ae] contrast

Figure 9 :
Figure 9: Speaker G's vowel spaces for recordings 2 and 3, showing that the latter has a greater distance between [i] and [ɪ]

Table 1 :
Corpus for data collection for acoustic analysis

Table 2 :
Vowel contrasts in recordings 2 and 3, before and ater the phonology course

Table 4 :
P-values for t-tests conducted with F1 values of contrasting pairs of vowels