DSpace at EWHA: A Contrastive Corpus-Based Analysis of Lexical Bundles between English L1 and English L2 Writers in Medical Journal Abstracts

Browse

My Repository

DSpace at EWHA일반대학원 영어교육학과 Theses_Ph.D

View : 1307 Download: 0

A Contrastive Corpus-Based Analysis of Lexical Bundles between English L1 and English L2 Writers in Medical Journal Abstracts

Title: A Contrastive Corpus-Based Analysis of Lexical Bundles between English L1 and English L2 Writers in Medical Journal Abstracts

Other Titles: 의학학술지 영문초록 코퍼스 어휘 다발 대조분석: 원어민 영어 저자와 한국인 저자의 어휘 다발 사용

Authors: 김은수

Issue Date: 2020

Department/Major: 대학원 영어교육학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 이은주

Abstract: 본 연구는 의학학술지 논문의 영문 초록에 사용된 어휘 다발을 분석하였다. 영어 원어민 저자 (NSE)와 영어 비원어민 한국인 저자 (NNSE) 간의 어휘 다발의 사용을 살펴보기 위해 어휘 다발의 구조적 특징과 기능적 특징을 분류하여 조사하였다. 또한, 의학학술지 논문의 영문 초록의 수사학적 기능과 관련된 중요 항목을 보고할 때 사용한 어휘 다발을 분석하였다. 원어민 저자가 작성한 논문 초록 코퍼스와 한국인 저자가 작성한 논문 초록 코퍼스를 구축하기 위해 6개의 권위 있는 해외 의학저널 (The New England Journal of Medicine, The Journal of the American Medical Association, The Lancet, The British Medical Journal, Annals of Internal Medicine, The American Journal of Medicine)과 8개의 권위 있는 국내 의학저널 (Yonsei Medical Journal, Journal of Korean Medical Science, Korean Journal of Internal Medicine, Korean Journal of Anesthesiology, Cancer Research and Treatment, Gut liver, Clinical and Molecular Hepatology, Korean Journal of Gastroenterology)을 선택하였다. 각각의 저널에서 431개의 영문 초록을 수집하여 약 10만 단어로 구성된 NSE 코퍼스와 NNSE 코퍼스를 구축하였다. 본 연구에서 수집한 의학논문은 사람을 대상으로 한 임상실험 (Clinical Trial) 또는 무작위 대조 임상실험 (Randomized Controlled Trial)으로 2008년부터 2019년까지 출판된 학술지 논문의 영문초록으로 한정하였다. AntConc 3.5.7 (Anthony, 2018)를 사용하여 각각의 코퍼스에서 3에서 9단어로 이루어진 어휘 다발을 추출하였다. 최소 빈도수가 5 이상이고 다섯 개의 다른 텍스트 상에서 나타나는 경우에 한해 어휘 다발로 정의하였다. 최종 선택된 어휘 다발을 구조적 특징과 기능적 특징으로 분류하였다. 그리고 초록의 수사학적 기능과 관련된 중요 항목을 바탕으로 어휘 다발을 두 명의 코더(coder)가 분류하였다. 어휘 다발을 분류한 후, 코더(coder) 간 신뢰도를 측정하였다. 분류된 어휘 다발의 Type과 Token을 Log-Likelihood를 사용하여 두 코퍼스 간의 유의미한 차이를 조사하였다. 본 연구의 결과는 다음과 같다. NNSE 저자는 동사구(VP-based) 어휘 다발을 많이 사용하였고, 반면에, NSE 저자는 명사구(NP-based) 어휘 다발을 보다 많이 사용하였다. 이 연구의 결과는 비원어민 저자가 원어민 저자보다 절(clause) 단위의 어휘 다발을 구(phrase) 단위의 어휘 다발보다 많이 사용한다는 연구결과와 일치한다. 구체적으로 NNSE 저자는 주로 to-부정사(to-clause; 예: the purpose of this study was to)를 사용하여 연구 목적을, 그리고 수동태(passive verb; 예: were randomly assigned, were divided into)를 사용하여 연구 방법을 보고하였다. 연구결과를 보고할 때는 그룹 간의 차이를 보고하기 위해 be 동사(be + (Adj/NP); 예: there were no significant differences)를 빈번하게 사용하였고, that-절(that-clause; 예: findings suggest that, it is suggested that)을 사용하여 연구결과를 논의하였다. 대조적으로, NSE 저자는 보다 자세하게 연구 방법을 설명하고 통계학적 지표를 사용하여 연구결과를 보고하였는데, 이때 명사구 + 후치수식(NP + other post modifiers; 예: hazard ratio for), 나머지 명사구 (other NP; 예: the primary outcome, intention to treat, a computer generated, primary care clinics), 그리고 능동태 (Active Verb; 예: we randomly assigned, included all patients who, secondary outcomes included,)를 많이 사용하였다. 이러한 연구결과는 의학학술지 논문 초록에서 NSE 저자는 방법과 결과에 중점을 두어 어휘 다발을 사용한 반면, NNSE 저자는 연구목적, 결과, 그리고 논의에서 정형화된 어휘다발을 반복적으로 사용하는 경향이 있음을 보여준다. 기능적 측면에서 어휘 다발의 사용을 살펴보면, NNSE 저자는 텍스트 주도(text-oriented) 어휘 다발을 많이 사용하였고 반면에 NSE 저자는 연구자 주도(research-oriented) 어휘 다발을 많이 사용하였다. 구체적으로 텍스트 지칭/구조화(text reference/structuring; 예: the aim of this study was to, this study aimed to)를 사용하여 연구목적을 보고하였고, 비교/대조(compare/contrast; 예: there were no significant differences)를 사용하여 연구결과를 보고하였다. 반면에, NSE 저자는 절차(procedure; 예: the primary outcome, were masked to, computer generated random)를 사용하여 연구 방법을 자세히 설명하였고 통계(statistics; 예: adjusted odds ratio)를 사용하여 연구결과를 보고하였다. 그 외에 NNSE 저자는 텍스트 구성(text-framing; 예: with respect to)과 인지(epistemic; 예: appears to be)를 NSE 저자보다 빈번하게 사용하였는데 이러한 결과는 NNSE 저자가 자연과학(hard science) 분야의 규약보다 사회과학(soft science) 분야의 규약을 더 잘 이해하고 따르는 것을 시사한다. NSE 저자는 연구의 필요성을 제시하거나 현재의 연구결과를 논의할 때, 원인/결과(cause/effect; 예: the risk of)를 NNSE 저자보다 빈번하게 사용하였다. 또한, 치료 과정이나 치료 이후에 나타난 예상하지 못한 결과 발생 시 이를 결과 보고해야 하는데 이러한 보고가 NNSE 코퍼스에서는 상대적으로 적어 원인/결과(cause/effect; 예: adverse events occurred)가 많이 나타나지 않았다. 이 연구 결과는 논문 초록의 방법과 결과 부분에서 NSE 저자가 NNSE 저자보다 상대적으로 보다 많은 내용을 보고한다는 것을 시사한다. 의학학술지 논문 초록의 수사학적 기능과 관련된 중요항목을 보고하는데 사용된 어휘 다발을 분석한 결과, NNSE 저자는 연구목표(objectives)와 주요결과(main outcome)를 보고할 때 어휘다발을 두드러지게 많이 사용하였고, 반면에 NSE 저자는 시험환경(setting), 결과측정(main outcome measures), 효과크기와 정밀도(effect size & precision), 그리고 위해(harms)를 보고할 때, 다른 항목에 비해 상대적으로 더 많이 어휘 다발을 사용하였다. 연구목표(objectives)를 보고할 때, NNSE 저자는 ‘the aim/purpose of this study was to’로 시작하는 to-부정사(to-clause) 구조를 사용하여 완성된 문장으로 끝맺는 것을 선호한 반면에, NSE 저자는 ‘To determine whether’를 시작하여 간단하게 구(phrase) 형태로 끝맺는 것을 선호하였다. 결과측정(main outcome measure)을 보고할 때, NSE 저자는 측정되는 연구결과를 중요성에 따라 위계화 하는 시그널 (예: the primary outcome, the primary end point, secondary outcomes)를 많이 사용하였다. 반면에, NNSE 저자는 측정하고자 하는 연구결과를 시그널 없이 언급하였다. NNSE 저자는 주요결과(main outcome)를 보고할 때 p-값(p-value; 예: between the two groups p)만을 언급하는 경향이 많았기 때문에 효과크기와 정밀도(effect size & precision; 예: in the placebo group hazard ratio) 항목에서 어휘 다발이 많이 발견되지 않았다. 반면에, NSE 저자는 효과 크기(effect size)와 신뢰구간(confidence interval)을 상대적으로 빈번하게 보고하였다. 특히 시험환경(setting; 예: primary care clinics)과 위해(hams; 예: adverse events occurred in) 항목에서 NSE 저자는 어휘 다발을 NNSE 저자보다 많이 사용하였는데, 이러한 결과는 논문 초록의 방법과 결과 부분에서 NSE 저자가 NNSE 저자보다 중요 항목을 보다 많이 보고하고 정형화된 어휘다발을 많이 사용한다는 것을 보여준다.;The study explores the use of lexical bundles between the native speakers of English (NSE) writers and the Korean non-native speaker of English (NNSE) writers to specifically investigate characteristics of lexical bundles in terms of structures, functions, and reporting essential items connected to the moves of medical journal abstracts. Two corpora (approximately 100,000 words for each) were constructed: an NSE corpus and an NNSE corpus. The NSE corpus comprised 431 medical journal abstracts from six leading journals (The New England Journal of Medicine, The Journal of the American Medical Association, The Lancet, The British Medical Journal, Annals of Internal Medicine, and The American Journal of Medicine). The NNSE corpus consisted of 431 medical journal abstracts from eight leading journals published in Korea (Yonsei Medical Journal, Journal of Korean Medical Science, Korean Journal of Internal Medicine, Korean Journal of Anesthesiology, Cancer Research and Treatment, Gut liver, Clinical and Molecular Hepatology, and Korean Journal of Gastroenterology). The types of medical journal abstracts were restricted to clinical trials or randomized controlled trials on humans. The publication dates were between 2008 and 2019. AntConc 3.5.7 (Anthony, 2018) was used to extract three to nine-word lexical bundles occurring at least five times across five different texts. The qualified lexical bundles based on the inclusion criteria were categorized into structural and functional classification in each of the subsections of medical journal abstracts. The intercoder reliability was measured using Cohen’s Kappa. The same routine was applied to the categorization of lexical bundles connected to essential items in the moves of medical journal abstracts. To find statistical significance between the corpora in terms of structures, functions, and reporting essential items, log-likelihood values were calculated. The findings of the present study were as follows. NNSE writers used more ‘VP-based’ bundles, whereas NSE writers used more ‘NP-based’ bundles. ‘VP-based’ in the NNSE corpus was used to describe research objectives, methodology, and results of the study. On the other hand, ‘NP-based’ in the NSE corpus were frequently observed to describe details of methods and report findings. The finding suggests that NNSE writers are more likely to use clausal bundles than phrasal bundles than the NSE writers. Specifically, ‘to-clause’ and ‘Passive Verb’ were the prominent features in the NNSE corpus to signal research objectives and describe patients in random assignment (e.g., the aim of this study was to, this study aimed to were enrolled in, were randomly assigned to receive). ‘Be + (Adj/NP)’ and ‘that-clause’ were distinctive features as well because the former was frequently observed to report group differences (e.g., there were no significant differences), and the latter was observed to signal discussions of findings (e.g., findings suggest that). In contrast, ‘NP + other post modifiers’ (e.g., hazard ratio for, mean difference in), ‘Other NP’ (e.g., the primary outcome, intention to treat, a computer generated, primary care clinics), and ‘Active Verb’ (e.g., we randomly assigned, secondary outcomes included, included all patients who)) were the distinctive features in the NSE corpus. Those structures were closely associated with research methodology and findings with statistical indicators. The finding suggests that NSE writers are more likely to detail research methods and findings with statistical markers in medical journal abstracts than are NNSE writers. Concerning the functions, NNSE writers used more ‘Text-oriented’ bundles, whereas NSE writers used more ‘Research-oriented’ bundles. ‘Text-oriented’ in the NNSE corpus was mainly used to indicate research objectives and group differences when reporting findings. In contrast, ‘Research-oriented’ in the NSE corpus was primarily used to describe research procedure and findings with statistical significance. The finding suggests that NNSE writers are more familiar with conventions in soft science disciplines than hard science disciplines. Specifically, ‘Text-reference/structuring’ (e.g., the aim of this study was to), ‘Compare/contrast’ (e.g., there were no significant differences), ‘Quantification’ (e.g., significant increases in), ‘Text framing’ (e.g., in terms of), and ‘Epistemic’ (e.g., may not be) were the distinctive features in the NNSE corpus. On the other hand, ‘Procedure’ (e.g., the primary outcome), ‘Statistics’ (e.g., adjusted odds ratio), and ‘Cause/effect’ (e.g., adverse events occurred) were the prominent features in the NSE corpus. The finding that NNSE writers used more ‘Text-reference/structuring’, ‘Text framing’, and ‘Epistemic’ than did their counterparts shows that NNSE writers are more familiar with conventions in soft science disciplines. Compared to NNSE writers, NSE writers tended to more focus on research procedure, causative relations, and provide findings with effect size and 95% confidence interval. Given the use of lexical bundles connected to essential items in the moves of medical journal abstracts, NNSE writers mainly used lexical bundles to report ‘Objectives’ and ‘Main outcome’, whereas NSE writers did to report ‘Setting’, ‘Main outcome measure’, ‘Effect size & Precision’, and ‘Harms’. When reporting ‘Objectives’, NNSE writers used structuring signals and ‘to-clause’ (e.g., the aim of this study was to, this study aimed to, this study was performed to) with keywords (aim, purpose, this, study, to) whereas NSE writers started with ‘to-clause’ (To determine whether) without the structuring signals. The finding suggests that NSE writers prefer phrases to a complete sentence to economically deliver research aims. When reporting ‘Main outcome measures’, NNSE writers used more endpoint or end point, while NSE writers used more end point and outcome. Compared to NNSE writers, NSE writers tended to report pre-specified results with the signals (e.g., the primary end point/outcome). When reporting main outcomes, NSE writers reported effect size and its precision (e.g., 95% confidence interval) more than did NNSE writers. While lexical bundles including only p-values were frequently observed in the NNSE corpus (e.g., between the two groups p), whereas lexical bundles including effects size were more prominently observed in the NSE corpus (e.g., in the placebo group hazard ratio). The finding suggests that NNSE writers are less likely to report both effect size and its precision than are NSE writers. The poor reporting of ‘Harms’ (e.g., adverse events occurred in) and ‘Setting’ (e.g., primary care practices) by the NNSE writers show that NSE writers are more likely to report unexpected results, number of centers, locations, periods of study, and the level of care in medical journal abstracts.