EFFECTS OF PERCEPTUAL TRAINING ON THE IDENTIFICATION AND PRODUCTION OF WORD-INITIAL VOICELESS STOPS BY ARGENTINEAN LEARNERS OF ENGLISH

In this study, we investigate the efectiveness of perceptual training, administered to Argentinean learners, in the perception and production of word-initial voiceless stops in English. 24 participants were divided into 3 groups: (i) Group 1, which participated in 3 training sessions; (ii) Group 2, which, besides performing the same training tasks, was explicitly informed about the target item; (iii) Group 3 (control). All participants took part in a pre-test, a post-test and a delayed post-test. In all these tests, they participated in a consonant identiication task and took part in a read-aloud task. Our results show a signiicant increase of both experimental groups in identiication. As for production, Group 2 exhibited a signiicant increase in /p/ and /t/ ater training. hese results are indicative of the efectiveness of perceptual training tasks in helping learners focus on Voice Onset Time.

In order to evaluate the role of these practices, factors such as the irst language, the target language, the learners' proiciency level and the phonetic aspect under investigation, among many others, should be considered.
Bearing this in mind, in this study we investigate the role of perceptual training in the acquisition of aspirated initial stops by Argentinean learners of English.English has a two-way voice distinction for stops in word-initial position.Voice Onset Time (VOT) is the main acoustic cue employed by speakers of English when distinguishing /p, t, k/ from /b, d, /.his distinction is clear as, in word-initial position in English, voiced plosives are generally produced with short (or zero) VOT, whereas voiceless /p, t, k/ exhibit voicing lag or positive VOT (aspiration).hese patterns, however, are not the same ones found in Argentinean Spanish.Even though Spanish also exhibits a twoway distinction for voicing, the VOT patterns through which this distinction is instantiated are diferent from those found in English, as aspirated plosives are not found in this language.Indeed, according to the literature on Argentinean Spanish (Lisker & Abramson, 1964;Abramson & Lisker, 1973;RAE, 2011), voiced plosives exhibit pre-voicing (or negative VOT), whereas voiceless plosives would be characterized by a short lag or zero VOT.
Given the characterization above, as we consider Argentinean learners of English, the acquisition of the two-voice distinction in English would imply a modiication in the VOT patterns found in these learners' L1 (Yavas & Wildermuth, 2006;Alves & Luchini, 2016;Tobin et al., 2017), leading these learners to produce aspirated voiceless initial stops. 1 However, recent studies carried out by our research group, with both Brazilian (Alves & Motta, 2014;Alves & Zimmer, 2015;Schwartzhaupt et al., 2015) and Argentinean learners (Alves & Luchini, 2016) of English, have suggested that acquiring word-initial voiceless stops is an even more complex process.We have shown that, unlike native speakers of English, who follow VOT as their main cue in the distinction between voiceless and voiced stops in word-initial position, VOT does not seem to be the sole cue Argentinean and Brazilian learners attend to in voicing distinctions.
herefore, it might be the case that, despite its recognized importance, the acoustic cue of negative VOT might not be the only phonetic aspect which accounts for voice distinctions in Argentinean Spanish, as it is possible that other acoustic cues are being primarily employed in the perception and production of voice distinctions.Similar cases have been found in Canadian French (Sundara, 2005), Korean (Oh, 2011) and Japanese (Kong et al., 2012).In these languages, additional cues, such as burst intensity and F0 in the following vowel, take the lead as the main acoustic correlates employed by speakers in order to distinguish plosive segments in perception and production.VOT, in these language systems, plays the role of an additional cue, which cannot function by itself in distinguishing the voicing of consonants, unlike what occurs in English.
he data presented in Alves & Luchini (2016) conirm the claim above.In this study, the perception of three diferent VOT patterns was investigated, among intermediate and advanced Argentinean learners of English: negative VOT (found variably in English /b/, /d/, //, cf.Lisker & Abramson, 1964;Simon, 2010), positive VOT (found in English /p/, /t/, /k/, cf.Lisker & Abramson, 1964;Cho & Ladefoged, 1999;Simon, 2010) and zero VOT, which may be found variably in English /b, d, / (cf.Lisker & Abramson, 1964;Simon, 2010) and categorically in Spanish /p, t, k/ (cf.Lisker & Abramson 1964;Abramson & Lisker 1973;RAE 2011).We also included a manipulated pattern, which was built as we took tokens of aspirated /p, t, k/ and removed their long-lag VOT completely, so that these new stimuli presented the VOT pattern of a voiced consonant in English, but at the same time preserved all of the acoustic cues (such as burst intensity and F0 frequency) that are found in voiceless stops in this language.Results from Alves & Luchini (2016) demonstrated that learners showed ceiling efects in the identiication of negative and positive VOT patterns.However, even though natural zero VOT was already identiied as voiced, consonants with artiicial zero VOT were still identiied as voiceless, suggesting that learners attended to something else besides VOT, in the identiication of the L2 voicing patterns.It is also relevant to mention that, in a previous study (Schwartzhaupt et al., 2015), the same identiication test had been applied to monolingual speakers of English, who showed high rates in the identiication of both zero VOT patterns (natural or manipulated) as voiceless.
he results above might have direct implications in the ields of second language acquisition and teaching.With regard to L1 systems in which positive VOT might not be taken as the main cue in voicing distinctions, such as Argentinean Spanish (Alves & Luchini, 2016) and also Brazilian Portuguese (Alves & Motta, 2014;Alves & Zimmer, 2015;Schwartzhaupt et al., 2015), the acquisition of the two-way voicing system of L2 English will imply that, irstly, learners focus their attention on positive VOT, so as to learn the new pattern which occurs in English (aspiration).he acquisition of English aspiration by learners of these L1 systems, therefore, would imply a double task: before learning how to produce the L2 VOT pattern itself, students have to learn how to "listen to" this cue, which does not play such an important role in their irst language.he importance of this new "tuning in" is quite clear when we consider the consequences of this lack of focus on positive VOT not only in perception, but also in production, especially if we assume a perceptual model such as the Speech Learning Model (Flege, 1995), which connects the processes of sound perception and production.If L2 learners of English do not focus on positive VOT, but rather attend to those other sources of information that are present in the acoustic signal, they are very likely not to have perception problems regarding the identiication and discrimination of English initial /p/, /t/, /k/ and /b/, /d/, //; indeed, these other acoustic cues which are being primarily considered may lead them to a correct identiication either way (voiceless consonants /p/, /t/, /k/, for example, present higher burst intensity and F0 values than /b/, /d/ // in English, as well as in those languages in which VOT is not the main cue). he fact that the two-voicing distinction in English may be perceived appropriately, regardless of the acoustic cue which is being focused on, might at irst allow us to conclude that it would not be necessary for learners to focus on positive VOT.However, should we consider the possibility that positive VOT is not considered in perception, there is a strong possibility that learners are not going to make use of this cue in production and, consequently, will not ind it necessary to aspirate voiceless plosives in English, as the voicing distinction might be maintained through other cues.his non-aspiration in learners' production might have consequences in intelligibility (cf.Schwartzhaupt, 2015), given the fact that speakers of English follow positive VOT (aspiration) to distinguish voiceless from voiced plosives, as our studies have suggested (Schwartzhaupt et al., 2015).It is therefore necessary to lead learners to focus on positive VOT, as the intelligibility of their oral productions might be afected if they do not.
Perceptual training tasks have been an important aid in the teaching of second language sounds, and current research has shown its positive efects in both perception and production (Nobre-Oliveira, 2007;Reis & Nobre-Oliveira, 2008;Aliaga-Garcia, 2010;Rato, 2013;Carlet, 2017).When planning training sessions, both researchers and teachers must consider not only the target language, but also the learners' irst language system.We therefore enquire if, in the case of learners whose L1 systems tend not to attend to VOT as their main acoustic cue, perceptual training and feedback on aspiration might be efective.Since, in this study, perceptual training has the role of exposing learners to a cue that tends to be unattended, it is also important to investigate the efect of associating awareness raising through explicit instruction (cf.N. Ellis, 2005;Andringa & Rebuschat, 2015) to perceptual training.herefore, in the present study, we investigate whether informing students about the target item they should focus on might make training more efective.Following Guion & Pederson (2007) and Pederson & Guion-Anderson (2010), we also investigate whether learners who are explicitly told to direct their attention to VOT present better results in their perception and production.
Starting from these assumptions, in this study we focus on the role of high variability perceptual training 2 (with or without explicit awareness raising) on the perception and production of aspiration by learners from the city of Mar del Plata (state of Buenos Aires), Argentina. 3Twenty-four participants were divided into three groups: (i) an experimental group, which took part in 3 training sessions (40 min.each); (ii) another experimental group, which, besides participating in the three training sessions, was informed about the L2 aspect to be focused on; (iii) a control group.he stimuli in the training sessions consisted of data produced by six diferent speakers of American English, and included two of the four VOT patterns whose identiication had been previously studied in Alves & Luchini (2016): positive VOT (voiceless stops in English) and artiicial/ manipulated zero VOT (aspirated plosives whose VOT had been cut of).With this hybrid pattern, we aimed to train learners on identifying these consonants as voiced, by concentrating on VOT as their main acoustic cue.All participants sat for (i) a pre-test; (ii) a post-test (three days ater the last training session); and (iii) a delayed post-test (one month later), in which identiication and production tasks were administered.With this methodology, we were able to investigate the generalization efects of perceptual training to production, as well as the possible long-term efects of this laboratorial practice.

Participants
Twenty-four students took part in the study, 17 women and 7 men.Participants were randomly divided into three groups of 8 students.Group 1 participated in the training sessions but was not told about the phonetic aspect to focus on.Group 2 participants, besides taking part in the training sessions, were asked to focus on aspiration and were taught that initial voiceless stops in English are aspirated (these instructions were repeated in the beginning of each one of the three training sessions).Group 3 served as control.
Participants were all taking their last high school year, and at the time of the investigation were attending 5 hours of English classes per week.hey were taking a preparation course for the TOEFL exam.Before taking part in the experiment, all participants took the Oxford Online Placement Test, 5 which indicated that all of them presented a C1 or a C2 6 level of proiciency in English, according to the Common European Framework.

Perceptual training sessions
he training sessions consisted of the administration of an identiication task with immediate feedback, built and administered on TP Sotware (Rauber et al., 2013), and repeated in each session.he stimuli had been produced by six diferent native speakers of American English (3 men and 3 women). 7 he task presented 18 audio iles. he lexical items used in the training sessions were 'pee' , 'tip' and ' k i t ' . 8Following Yavas and Wildermuth (2006) and Schwartzhaupt (2012), we used stimuli followed by a high vowel, since this environment fosters higher levels of aspiration and its perception.here were six diferent audio iles for each one these lexical items, one of which produced by a diferent speaker.From these 6 stimuli, 3 of them had their aspiration cut of, so that we could build the artiicial zero VOT pattern (a hybrid consonant, as already described).Each one of these 18 stimuli (9 with zero VOT and 9 with positive VOT) was repeated 20 times in a random order, which led to 360 tokens heard in each session.Pauses were allowed ater 90 tokens each.
In the training sessions, which consisted of an Identiication task, learners had to choose the initial consonant of the word they had just heard, as seen in Figure 1.When answers were not correct, learners were informed of the correct answer immediately, and were forced to (and if its place of articulation was correct).By doing so, we expected to train learners to pay attention to positive VOT, as the presence/absence of aspiration was decisive to their answers.listen to the stimulus again before pressing the correct button, as shown in Figure 3.Each training session lasted around 30 minutes. he training tasks were administered at the school lab, and students heard the stimuli with earphones.As already mentioned, in the beginning of each session, participants who belonged to Group 2 were asked to base their identiication on the presence/absence of aspiration, and were taught that initial /p/, /t/ and /k/ are aspirated in English.

Data collection instruments -Pre and Post-Tests
As mentioned, participants sat for a pre-test (which took place two days before the beginning of the training sessions), a post-test (which took place three days ater the last training session) and a delayed post-test (which took place one month ater the irst post-test).In all these three data collection sessions, learners performed an identiication and a production task.
In the Identiication Test administered in the pretest and in the two post-tests, learners were presented with individual word stimuli and were invited to click on a button indicating the initial consonant of the word they heard (/p/, /b/, /t/, /d/, /k/ or //).No immediate feedback was provided.In the beginning of the test, three trial runs were provided.Ater the trial runs, stimuli with the four VOT patterns (negative VOT, natural zero, artiicial zero and positive VOT) were included and presented in a random order.In the task, which comprised 48 stimuli words to be identiied, each one of the four VOT patterns was presented in 12 tokens (4 for each place of articulation, the same word produced by a diferent speaker, 9 as in [b]it, [d]ick, and []ill, for the negative VOT pattern, for example). 10Tests were taken at the language lab.

Production Task
he production task was also the same one employed in Alves & Zimmer (2015) (with Brazilian leaners of English).his test consisted of reading isolated words presented on individual slides of a .pptile. he target words employed were 'peer' , 'pit' , 'pee ' ,  'team' , 'tick' , 'tip' , 'kit' , 'keel' , and 'kill' , 11 that is, three diferent lexical items for each place of articulation.Each target word was produced twice, which adds up to six tokens per consonant for each participant.
Participants took the test individually, in a silent room. he participants' production was recorded with a Philips SHM 3550 headset, on a DELL Inspiron laptop computer.Productions were recorded on Audacity 2.0. 12Ater collected, the data were analyzed acoustically on Praat version 5421 (Boersma & Weenink, 2015). he statistics were carried out in SPSS-18.correct answers for each one of the patterns investigated, 13 as well as the results of the intragroup analysis that we carried out.Table 1.Accuracy rates (percentage of accuracy in irst line, average and standard deviation in second line and median in third line of each column) in the Identiication tasks (Pretest, Post-test and Delayed Posttest) and Friedman test results for the three groups. 14he descriptive results in Table 1 serve as evidence to our claim (Alves & Luchini, 2016) that additional cues besides VOT are important in the voicing distinctions of English by Argentinean learners.If voicing status was based solely on VOT, both zero VOT and artiicial zero VOT would have been identiied as voiceless in the pre-test already.However, learners seem to prefer to identify the natural zero VOT pattern as voiced, but the manipulated pattern exhibiting a hybrid consonant as voiceless.his suggests that other cues might be at play in this decision.

Identiication Task
We ran Friedman tests 16 (intra-group analyses) in order to verify if there were signiicant diferences among the correct responses in the pre-test, the post-test and the delayed post-test, considering each one of the groups of participants.As expected, no signiicant diferences concerning negative VOT responses in any of the groups were found; this had already been predicted, since voiced stops in Argentinean Spanish are pre-voiced.We had also predicted that a signiicant diference would not be found for positive VOT, as previous studies (Alves & Luchini, 2016) had shown almost-near ceiling efects in the identiication of this pattern as voiceless.Surprisingly, the signiicant diference found in Group 1 and the marginally signiicant diference (p=.053) shown in Group 2 indicated that there was still room for improvement, and training helped learners increase their accuracy rates.
Following our irst hypothesis, we had predicted that training would prove efective in the identiication of (natural) zero VOT and artiicial zero VOT.In other words, training would help learners attend to the fact that, unlike what happens in their L1, zero VOT characterizes voiced, not voiceless stops, in the target language.In the same fashion, a signiicant diference was also hypothesized for artiicial zero VOT, as we expected training to help learners focus on VOT as the main acoustic cue responsible for voicing distinctions in the target language.he results of the Friedman tests with Groups 1 and Group 2 conirm this hypothesis: in Group 1, the increase in the accuracy rates of zero VOT was highly signiicant, and a signiicant diference was also found in the perception of artiicial zero VOT. he efects of training could also be noticed in Group 2, which exhibited a signiicant increase for zero VOT and a marginally signiicant diference (p=.053) for artiicial zero VOT.Moreover, another source of evidence for the role of perceptual training can be found in the results of the Control Group -no signiicant diferences were found in any of the VOT patterns tested.
In Table 2, we present the signiicance values of the post-hoc Wilcoxon Tests (employing Bonferroni correction), which compares the pre-test and the immediate post-test, the post-test and the delayed posttest, as well as the pre-test and the delayed post-test.
-----not applicable (Friedman test results were not signiicant), n.s.not signiicant, *signiicant (p<.017) For Group 1, results of the post-hoc test revealed signiicant diferences between the pre and the posttest in the identiication of positive VOT.As already mentioned, this had not been predicted, since learners were expected to present very high accuracy levels in the identiication of this pattern right in the pre-test.Still regarding Group 1, signiicant diferences were also found in the identiication of zero VOT as voiced, as can be easily seen in the descriptive data shown in Table 1.hese signiicant diferences were found between the pre-test and each one of the two post-tests, but not between the two post-tests themselves.hese results might be suggestive that, at least for the zero VOT pattern, the results found in the immediate pretest were maintained in the post-test.Finally, as for the perception of the manipulated VOT pattern by Group 1, signiicant diferences were found between the pre-test and the delayed post-test only.As for this VOT pattern, the descriptive accuracy rates tend to increase (but not signiicantly) from the pre-test to the post-test, and increase even more in the delayed post-test, indicating that the efects of training may even increase with time.
In Group 2, signiicant increases for zero VOT and artiicial zero VOT were found between the pre and the irst post-test.It is interesting to consider that signiicant results were not found between the pre and the delayed post-test in this group, which prevents us from fully conirming our third hypothesis on the long-term efects of training, as will be discussed later; despite this fact, the descriptive results in Table 1 show that the delayed post-test rates are still higher than those found in the pre-test, but not as high as those found in the immediate post-test.he inding of signiicant diferences only between the pre and the irst post-test seems to characterize an opposite pattern to that found in Group 1, in which we found a signiicant diference between the pre and the delayed post-test, but not between the pre and the irst post-test.We may speculate that this diference might be the result of the type of training (with or without explicit instruction) received by each one of the groups.In the group that received instruction (Group 2), the diference in accuracy rates between the pre and the post-test seems to have been more abrupt right in the irst post-test, indicating that the provision of instruction might contribute to immediate efects.In turn, Group 2, which was not instructed on what to pay attention to, needed some more time (and, maybe, a larger amount of input) to "discover" what aspect should be focused on.Although additional studies are undoubtedly necessary for this puzzle to be solved, the possibility that the addition of instruction to training sessions might contribute to more signiicant diferences in a shorter period of time must not be disregarded.
We also ran inter-group tests, in order to verify signiicant diferences between the three groups of participants in each one of the tests.In Table 3, we report the results of the three Kruskal-Wallis tests.As for zero VOT, both experimental groups (1 and 2) outperformed the Control Group in both post-tests.As for the identiication of artiicial zero VOT, only Group 2 outperformed the Control Group statistically in the irst post-test, but both Groups 1 and 2 outperformed the Control Group in the delayed post-test.his may be understood if we consider the descriptive data shown in Table 1, which indicates that, although there was an improvement in the descriptive accuracy rates of artiicial zero VOT in Group 1 between the pre and the post-test, accuracy values are even higher for Group 1 in the delayed post-test.Once again, we should speculate that, with no explicit instruction, it might take longer to "discover" the acoustic cue learners should focus on in the input they received.
Finally, it is also important to highlight that Table 4 shows no signiicant diferences between the results of Group 1 and Group 2, in any of the data collection sessions.Besides reinforcing the efects of perceptual training, these results seem to suggest that both forms of training (with or without instruction provided) might be efective in developing perception.
Summing up, the results of the statistical tests tend to conirm our irst hypothesis, which predicted positive efects of training for both experimental groups in the perception of zero VOT and artiicial zero VOT.Indeed, training also helped learners perfect their perception of positive VOT. he results seem to suggest that perceptual training (whether accompanied by instruction on aspiration or not) helps learners focus on VOT as a decisive cue, leading them to listen to the presence/absence of aspiration as a key factor to determine voicing status.

Production Results
In our second hypothesis, we had predicted that the efects of perceptual training could be generalized to production.In Table 5, we present the mean VOT Note.Md = median; Standard deviations are presented in brackets; *? p < .10(marginally signiicant), * p<.05, ** p<.01 As we had previously done in the perceptual test results, we ran intra-group analysis to verify if there were going to be signiicant diferences between the three tests, considering each group separately.Although the descriptive data reveal some improvement ater training in the production values presented by Group 1, only marginally signiicant values of the three groups, as well as their standart deviation and median values.he results of the Friedman tests for each of the groups are also shown.diferences were found in the production of /p/ (p=.093) and /k/ (p=.072).Signiicant diferences (p<.001) were found for /p/ and /t/ in Group 2. As for this group, a marginally signiicant diference was found for /k/ (p=.093).Surprisingly, the Control Group also showed a marginally signiicant diference for /t/, with p=.093 (almost reaching 1.0).
In Table 6, we present the results of the post-hoc Wilcoxon tests (Bonferroni correction):

Control Group
-----not applicable (Friedman test results were not signiicant), n.s.not signiicant, * p<.017, ** p<.01 his table indicates a signiicant diference between the pre-test and the two post-tests in the productions of /p/ and /t/ by Group 2.Even though the results of the production test are not as clear as those found in the perception test, as the production data do not fully conirm our second hypothesis, the results presented in Table 6 detail some important aspects that must be taken into consideration.Firstly, as for the production of /p/ and /t/ by Group 2, signiicant diferences were found between not only the pre-test and the irst post-test, but also the pre-test and the delayed post-test.Secondly, as we concentrate on the results for the production of /p/ and /k/ by Group 1, or /t/ by the Control Group (whose signiicant diferences had been set marginally), we ind no signiicant diferences in the post-hocs.In other words, the only signiicant diferences which showed post-hoc efects were the ones related to Group 2.
It is also worth mentioning that, even though few signiicant diferences were shown in Table 5, the descriptive data presented in that very same table indicate some increase in VOT values between the pretest and post-test results, especially for Group 1 (see, for example, the results for /k/ produced by this group).Despite this descriptive diference, statistical diferences were not found.One possible explanation for this fact might be in the low number of participants for each group, which can be considered to be a limitation of the present study.Future replications of this study, with a larger number of participants in each group, might yield signiicant diferences.
Still concerning the intra-group analysis, it has to be considered that no signiicant diferences between the two post-tests were found in any of the groups or consonants. he lack of signiicant diferences between the results of the two post-tests was also noticeable in Table 2, which described the results obtained in the perception test.his might also be regarded as an indicator of the long-term efects of the training sessions.
In what follows, we present the inter-group analysis.Table 7 presents the results of the Kruskal-Wallis tests, which correspond to each one of the three data collections.In Table 8, we present the results of the post-hoc Mann-Whitney tests.
-----not applicable (Friedman test results were not signiicant), n.s.not signiicant, **p<.01 he Kruskal-Wallis tests showed signiicant diferences for Group 2 only, in the production of the bilabial stop /p/. he post-hoc tests show a signiicant diference between the two experimental groups in the irst post-test, which can be conirmed by a visual inspection of the descriptive data presented in Table 5.Whereas Group 2 presented a signiicant increase between the pre and the irst post-test, the irst group did not seem to show an increase in the VOT values for this consonant.he results outlined in Tables 7 and  8 conform the intra-group analysis, and do not allow us to conirm our second hypothesis fully.Indeed, signiicant diferences were noticeable in Group 2 only.
While we must consider the possibility that the small number of participants might have played a role in these non-signiicant diferences, it is also important to ind some speculative reasons why a signiicant increase was found only in Group 2, but not in Group 1.In fact, although both groups showed signiicant intragroup diferences with regard to perception, the production results show a signiicant improvement in only one of the groups, whose participants had been instructed on what to focus on in the training sessions.Given these results, we cannot disregard the possibility that explicit instruction might have had a role in this signiicant diference.As the production test allows for a high level of monitoring, the provision of explicit knowledge on the phenomenon to be focused on might be used in monitored production.In other words, it might be the case that this signiicant diference is not the direct result of perceptual improvement, but the use of explicit knowledge in monitored production.Additional studies, with a larger number of participants and some production test designs that allow for less monitored production, might be relevant in providing a more deinite answer to the possibility raised here.

Final considerations
As we analyze the perception and production results by the groups in the three tests (pre-test, immediate post-test and delayed post-test), the hypotheses proposed in the Introduction of this paper must be revisited.Hypothesis 1 predicted that perceptual training, with or without explicit instruction, would lead to an improvement in the identiication of zero VOT and artiicial zero VOT as voiced.his hypothesis was conirmed, as both experimental groups showed signiicant diferences in these two patterns.Perceptual training was also relevant in the identiication of positive VOT as voiceless, helping learners reach ceiling efects in the correct identiication of this VOT pattern.
As for the second hypothesis, which predicted that learners would be able to generalize this growth to production, this could not be fully corroborated.Indeed, only marginally signiicant diferences (with no posthoc signiicant diferences) were found in Group 1.In the intra-group analysis, Group 2 presented a signiicant increase concerning the production of /p/ and /t/, so we cannot disregard the possibility that instruction played a more decisive role in these results.In this sense, instruction might have proved useful in allowing learners to monitor themselves and achieve higher VOT results, even when they are not developmentally ready to do so.Further studies investigating the role of instruction isolated from perception training might also be useful, as they might show that students receiving instruction might present better production levels even before an increase in perception, challenging the canonical perception-production developmental order (a possibility raised in Flege, 1995).It might be the case, therefore, that this increase in production might be the relection of conscious monitoring, and might not be relected in more natural speech settings.
Finally, our third hypothesis predicted that the improvements found in both perception and production would be maintained one month ater the last training session.Once again, this hypothesis was only partially corroborated.As for the perception of both zero VOT and artiicial zero VOT, our intra-group analysis showed no signiicant diferences between the pre-test and the delayed post-test in Group 2 (which received instruction), despite the signiicant diference found between the pre-test and the immediate posttest.Despite this fact, it is well true that the descriptive rates found in their delayed post-test are still much higher than those found in their pre-test.As for the accuracy rates for Natural VOT by Group 1, signiicant diferences are found between the pre-test and each one of the two post-tests, which would allow us to corroborate this hypothesis; however, with regard to the artiicial zero pattern, a signiicant diference is found between the pre-test and the delayed post-test only.All of these perceptual results lead us to speculate that the combination of explicit instruction and perceptual training might lead to immediate changes in the learners' perceptual rates; these changes might be so abrupt that such high rates are not maintained one month later.In turn, it might be the case that learners that receive no instruction need a longer period of time in order to 'tune in' to the right cue.As for the production results, the intra-group analysis indicated that the signiicant increase in the production of /p/ and /t/ by Group 2 also presents a long-term status.All these factors considered, it is undeniable that, even in those cases in which no signiicant diferences between the pre-test and the delayed post-test had been found, the descriptive values found in the delayed post-test were still closer to those found in the immediate posttest than to those found in the pre-test, which allows us to suggest some positive (descriptive) efects of the training in the post-test.As a result of this fact, signiicant diferences between the immediate and the delayed post-test were never found in perception or production, suggesting that the efects of training might still be felt one month later.
It is undeniable that the present study shows a considerable number of limitations, most of which have already been pointed out throughout this article.Firstly, the number of participants might have contributed to the absence of signiicant diferences in the production test.Secondly, the number of training sessions (only three) might not have been enough to foster generalization to production.Indeed, this small number of sessions is a result of time constraints faced with the group of learners investigated, and are a consequence of problems that are frequently faced by experimental studies which deal with classroom realities.In this study, we aimed at minimizing such a limitation with the provision of awareness raising to Group 2, which would accelerate the processing of the target item being trained.Finally, it might be the case that our delayed post-test should have taken place at some time later than one month.his would have allowed us to say whether the supposed perceptual improvement found in the delayed post-test in Group 1 (training only) would be maintained ater a longer period of time.A more delayed post-test would have also helped us say whether the improvements in production found in Group 2, which were considered to be the result of a more monitored production, would be maintained at some time longer.We have to reinforce, once again, that this short period of time between the two post-tests was a result of the time constraints imposed by the classroom environment in which our research study took place.
hese limitations open new avenues for further investigations and research questions.With regard to perception, further studies on the efects of place of articulation in the perception of zero VOT and artiicial zero VOT might be of great importance.As for production, further analyses of the generalization to novel items also prove relevant. 18Finally, the efects of explicit instruction combined with perceptual training need additional research studies.It is also important to investigate the role of these two classroom interventions individually; this will allow us to verify if the efects of training are fostered by instruction, or if instruction by itself might be relevant, regardless of any perceptual practice.In this sense, variables such as the number of training sessions in perceptual studies, as well as the kind of awareness raising task provided (with a more or less metalinguistic/communicative tone) are also important aspects to be considered and investigated.
In conclusion, the results presented in this paper indicate beneicial efects of perceptual training in foreign language classrooms, even in situations in which time constraints might represent an impediment for a higher number of training sessions.he provisions of instruction added to perception might not only contribute to an increase in perception, but also foster production.Considering the results of the study, we may say that perceptual training not only helped improve the perception of a given acoustic cue that proved diicult to learners; indeed, it guided learners to focus on a new cue which, in their irst language, does not play a decisive role.3.As we acknowledge the fact that spectral and timing cues interact perceptually as they are integrated in the perception of stops (Dmitrieva et al., 2015;Francis et al., 2008;Kingston et al., 2008), one might ask why we have isolated the VOT cue in our training and testing experiments.As explained above, given the fact that learners attend to other cues besides positive VOT in perception, they ind no diiculties in discriminating and identifying voiced and voiceless initial stops in English (Alves & Motta, 2013;Alves & Zimmer, 2015;Alves & Luchini, 2016).Although no perceptual problems are found, when it comes to production, learners also use these other cues and do not attend to positive VOT.his lack of word-initial aspiration leads to identiication and intelligibility problems among native speakers of English (Schwartzhaupt, 2015;Schwartzhaupt et al., 2015).herefore, in line with Abramson & Whalen (2017), by focusing on VOT alone and by providing a manipulated pattern which "forces" learners to focus on the presence of positive VOT, we expect learners to focus on positive VOT in perception; as a consequence, this should lead to higher VOT values in the production of word-initial voiceless stops.
4. In the identiication pre and post-tests, we also investigated the perception of negative VOT and positive VOT in English.However, given the ceiling efects found in Alves & Luchini (2016), we did not include these two patterns in this hypothesis, as we expected high accuracy levels in perception in the pretest already.
5. For further information on the Oxford Online Placement Test Online, see Purpura (2007) and Pollitt (2007).
7. hese speakers were the same whose stimuli were used in previous studies, such as Alves & Motta (2014), Alves & Zimmer (2015) and Schwartzhaupt et al. (20150.hey are the same speakers whose stimuli were used in the identiication pre and postests (even though the identiication task in the pre and post-tests was carried out with other target words).
8. We can justify the low number of lexical items due to the fact that, in the stimuli obtained by the six speakers, tokens of word-initial /b/, /d/, // with zero VOT were not frequently produced, as negative and zero VOT may occur variably in word-initial voiced stops in English.hese were the lexical items whose productions were more frequently produced with zero VOT.
9. he same speakers whose stimuli were presented in the training task.
10. he lexical items in the identiication task in the pre and post-tests are diferent from those stimuli used in the training sessions.herefore, should there be an improvement in the accuracy rates in the identiication test, this indicates the learners' ability to generalize their perceptual ability to diferent lexical items.
11. From the three lexical items that represent each one of the places of articulation, one of them had been used in the training task (pee, tip, kit), another one had been employed in the perceptual pre and post-tests (pit, tip, kill) and one was a novel lexical item (peer, team, keel).With this design, we aim at investigating whether there are higher VOT values in those lexical items with which learners have already been trained.
For delimitation purposes, we leave this veriication for a future study.
13.As already mentioned, for stimuli starting with positive VOT, answers identifying the consonants as voiceless (/p/, /t/, /k/) were considered to be correct.For stimuli starting with the other three patterns (negative VOT, zero VOT and artiicial zero VOT), answers identifying the consonants as voiced (/b/, /d/, //) were considered to be correct.Mistakes concerning place of articulation (for example, when aspirated /p/ was perceived as /t/, although the voicing of the initial consonant was identiied correctly) were not computed as correct answers.
14.In this table, perception results for all places of articulation are averaged together, since we found no place of articulation efects on perception.
15.As already shown in Alves & Luchini (2016), the perception of negative VOT and positive VOT by Argentinean learners tend to exhibit ceiling efects.his is justiied as negative VOT occurs in word-initial voiced stops in Spanish, and learners tend to focus on other acoustic cues (such as F0 and burst intensity), instead of aspiration, to identify aspirated stops as voiceless.As stated in our fourth footnote, this is the reason why no hypotheses were proposed for these two patterns.hese results reinforce the need of a perceptual training approach focusing solely on the presence/absence of aspiration.
16.In this study, we ran non-parametric tests, as the Normality Tests of Kolmogorov-Smirnov and Shapiro-Wilk indicated that the dependent variables tested did not show a normal distribution.17.Unlike the data shown in Table 1 (perception), in this table each place of articulation is presented separately, since diferences regarding place of articulation can be clearly shown in production.Although data on wordinitial voiced stops were also collected, these data are not presented in this paper, as all of the students' productions tended to produce pre-voiced consonants (cf.Simon & Leuschner, 2010).As pre-voiced stops occur variably in word-initial position in English, we interpret that the production of negative VOT by learners does not afect intelligibility and, therefore, they need not acquire the zero VOT pattern in wordinitial /b, d, /.his also justiies why our training sessions focused on the presence or absence of Positive VOT only.
18.As mentioned in the Method, our production test allowed for the investigation of the efect of both trained and novel words.his investigation corresponds to the next step in our analysis.

Figure 3 .
Figure 3. Training sessions: identiication test on TPnegative feedback he results of the identiication task are presented in Table1.In this table, we present the percentage of

Table 5 .
Production test results (average (in ms)in irst line, standard deviation in second line and median in third line of each column) and Friedman test results17