Clinical Research and Evidence-Based Pediatric Surgery

  • Dennis K. M. IpEmail author
  • Kenneth KY Wong
  • Paul Kwong Hang Tam
Living reference work entry


Evidence-based medicine (EBM) is the process of acquiring the best available research evidence and applying this evidence to inform the best practice in a defined problem in clinical practice. The widespread popularization of the concept of EBM since its first introduction more than three decades ago has resulted in a paradigm shift in biomedicine from a largely experience- and opinion-based practice toward one based increasingly on objective scientific evidence.

Properly designed and implemented clinical research represents the best way to provide high-quality scientific evidence for informing the practice of EBM. Among different study designs, prospective randomized controlled trial (RCT) is regarded as the gold standard of clinical research and gives the highest level of evidence (Class I evidence). For specialties that are progressing faster in the practice of EBM, the rapid accumulation of research finding also highlights the importance of critical appraisal skill for assessing the quality of available evidence. In many surgical settings including pediatric surgery, however, important barriers that may hinder the proper design and implementation of RCTs are still common. A better understanding of the concept of clinical research and EBM would thus serve to equip researchers in these settings to produce better scientific evidence and for practitioners to incorporate the best available evidence into their clinical practice.


Clinical research Evidence-based medicine Randomized controlled trial (RCT) Study design Critical appraisal Hierarchy of evidence 


The concept of evidence-based medicine (EBM) has evolved since the 1980s and gradually becomes an essential element in many specialties of clinical practice. The aim of practicing EBM is to support important clinical decision-making and policy recommendation in an objective, fully justifiable, and evidence-based manner by the best and most updated research findings. Over the years, clinical research and evidence-based medicine have developed hand in hand within many clinical specialties, as EBM needs to be informed by the best available research evidence, which in turn will shape further directions of research.

As illustrated best with the case of drug trials, clinical research nowadays can involve rigorous experimental study designs with large sample size, complicated logistics, and very technical statistical analysis. Most of these, however, are unfamiliar to pediatric surgeons working busily at the very clinical frontline. This is shown by the predominance of observational and retrospective studies, descriptive case reports or case series, and the lack of prospective trials in the body of scientific literature of the specialty. While these observational studies do serve to broaden the collective experience of the pediatric surgery community, they are inherent to bias and generally do not provide strong supportive evidence for defining best practice. The aim of this chapter is to present an introductory concept on evidence-based medicine and clinical research and their relevance in the setting of the specialty.

Common Types of Study Design

Before embarking on a detailed discussion on evidence-based medicine and clinical research, a basic understanding of the different kinds of study design and the hierarchy of evidence is essential. Medical research generally aims to establish the relationship between a cause and an effect of interest in a particular setting. The cause, generically referred to either as the “exposure” or “explanatory/predictor variable,” may include different risk factors or therapeutic interventions, and the effect, generally referred to as the “outcome” or “response variable,” may include status like disease incidence or other clinical consequences. Different study designs employ different approaches for ascertaining these two components, which may consequently be subjected to different methodological concerns.

Study designs commonly encountered in medical research include case report, cross sectional, case control, cohort study, and randomized controlled trial (RCT).

Case reports are stories of individual patient reporting unique or unexpected clinical findings. This can include unusual mode of disease presentation or clinical course, unreported side effects or unusual treatment responses for known disease, or the presentations, diagnoses, and/or management of new and emerging diseases. Although case reports are anecdotally based and ranked low in the evidence hierarchy with their generally less rigorous evidence for establishing general principles in disease causation or treatment effectiveness, those reporting carefully observed original and unexpected findings can contribute significantly to medical knowledge or even as the first line of evidence for revolutionary medical advance. By the current standard of law and ethics, most journals would regard the patient’s explicit written consent as mandatory for any article that contains personal medical information about a living individual particularly if this can potentially identify the patient.

Cross sectional, case control, and cohort study are all observational studies , in the sense that the exposure or intervention of interest is chosen by the subjects themselves rather than being assigned by the researcher. One major problem of observational studies is the potential for biases to be introduced, which can affect the validity of the conclusion to be made. Cross-sectional study generally aims to be descriptive, while the latter two are analytical involving a comparison between groups of patients (intervention and control groups).

Case-control studies are always retrospective in nature, with the study subjects identified by their outcome status (e.g., having or not having a particular disease or clinical consequence). After data on exposure is ascertained, the groups are compared. In order to ensure comparability of the cases and controls to avoid selection bias, they should be drawn from exactly the same population. As subjective recall by participants is commonly depended upon the ascertainment of exposure in a case-control study, differential recall in the comparison group, as affected by the outcome status (having or not having the disease), may be a significant source of bias (recall bias). Case-control study allows for the simultaneous investigation of multiple exposures or risk factors and is especially suitable for studying rare outcomes (e.g., rare diseases), which may take too long for a sufficient sample size to be recruited in a prospective cohort.

Cohort studies are longitudinal studies involving the comparison of two or more groups of participants (cohorts) characterized by their different exposure status (e.g., smokers versus non-smokers or individuals who had received different treatment options) and assessed for their respective outcome of interest (e.g., incidence of disease or other clinical outcomes). This can be done prospectively by following up the cohorts over time or retrospectively by identifying them from existing clinical data or medical records. In a clinical setting these cohorts may include patients receiving different therapeutic options for the same condition. As the intervention is not being randomized, bias can be introduced by a combination of factors such as patient or clinician preferences, referral pattern, or institutional policy. In consequence, the observed difference in outcomes may be contributed by some known or unknown confounding factors rather than the treatment itself. Cohort studies are particularly suitable for the study of rare exposures.

Randomized controlled trial (RCT) is generally considered to be the gold standard to give the “best evidence” for medical research. RCTs are commonly used to compare the clinical efficacy of a new therapeutic intervention with an existing one, or in the case of a disease without any existing effective treatment, comparing with a placebo. For a RCT to be scientifically and ethically justified, there should be enough underlying evidence from early studies (phases 1 and 2) to support the potential effectiveness and safety of the intervention in human while still having a clinical equipoise with genuine doubt about the comparative effectiveness of those interventions to be assigned in the comparison groups. When properly designed and conducted with a large enough sample size, RCT can minimize the potential impact of bias and confounding on the study validity.

One key feature of RCT is randomization , which allocates study participants to the intervention and comparison groups for receiving one or other of the alternative treatments under study based on pure chance. This is commonly done by the use of a randomization table or a computer-generated randomization code. Its aim is to balance any known or unknown prognostic variables among the two groups and to ensure they are comparable to start with, thus minimizing the possibility for the observed outcome to be biased by confounding factors (allocation bias). Besides randomization, the clinicians and researchers responsible for the enrollment process should also be concealed from the randomization sequence to avoid potential bias to be introduced by conscious or unconscious selection or altering on the enrollment order of participants into different groups (allocation concealment ). This can be done either simply with the use of opaque envelopes or more elegantly using a remote call center for assigning patients during the enrollment process. This is essential even if the intervention itself can be blinded and different from the actual process of blinding that happens after randomization. Blinding refers to the situation where study participants, researchers, and data collectors are not aware of which group each participant has been allocated, so as to ensure the comparison groups would be handled similarly in the conduction of the study. Blinding helps to avoid any possibility of bias to be introduced due to differential perception or expectation, either by the patient or the researcher.

Systematic review , which is not clinical research per se, is a literature review employing a systematic approach of identifying, appraising, selecting, and synthesizing all high-quality research evidence relevant to a particular focused research question from several studies employing a comparable study design. The purpose is to sum up the best available research evidence on a specific question. Bias in the review process is minimized by the use of transparent, predefined, and reproducible methods for each of the steps involved. Whenever appropriate, results from different studies can be pooled by a statistical technique called meta-analysis to give a more precise estimate on the overall effect size based on a larger sample size or to address controversies due to contrasting results from different studies.

Besides knowing the common study designs, it is also important to understand that not all “evidence” is equal. Research employing different study designs can generally be ranked in a hierarchy of evidence according to the potential validity of their findings as related to the strengths of different study designs (Fig. 1). Generally, the strength of evidence can be categorized into three major classes. Class I evidence refers to those from prospective RCTs; Class II evidence includes those from prospective observational studies, cohort, prevalence, and case-control studies; and Class III evidence includes those from retrospective clinical series, databases/registry, case reports, and expert opinion. Prospective RCTs represent the gold standard of clinical trials and are considered more suitable and capable for hypothesis testing. Systematic reviews of RCTs serve to contribute to the best valid scientific evidence for informing EBM. On the other hand, cohort, case-control studies, and case series and reports are more likely subject to biases and are considered at best for hypothesis generating. While this hierarchy is not absolute and other issues such as the quality of individual research also need to be considered, it does help to provide a guide to the strength of the available evidence.
Fig. 1

The hierarchy of evidence

Concept of Evidence-based Medicine (EBM)

Initially emerged as the discipline of clinical epidemiology, EBM aims to inform clinical practice in a responsible and accountable manner by an explicit integration of the best available evidence, the doctor’s clinical expertise and experience, and the patient’s wishes in clinical decision-making. One of the most widely adopted definition of evidence-based medicine was given by David Sackett et al. (1996) who founded the first department of clinical epidemiology at McMaster University in Canada and pioneered the concept of EBM as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients or the delivery of health services.” Current best evidence is up-to-date information from relevant, valid research. These apply to all relevant decisions in different levels of healthcare settings, including the potential for harm from exposure to particular agents, the accuracy of diagnostic tests, and the predictive power of prognostic factors (Cochrane 1972).

The 5As of an EBM Cycle

As medicine is an ever-changing field, keeping up with the most current clinical evidence in a busy practice can be daunting. Understanding the approach and familiarization with the skills involved would be a crucial step in the successful practice of EBM in any specialty. The five sequential key steps (commonly referred to as the “5As” cycle) in the practice of EBM are as follows (Fig. 2):
  1. 1.

    Assess – The first step should start with a thorough clinical assessment of the patient and the problem to determine the pertinent issues, which may include a differential diagnosis, treatment decisions, or prognosis.

  2. 2.

    Ask – Formulating a clear and answerable clinical question for the patient’s problem.

  3. 3.

    Acquire – Searching for appropriate and relevant evidence from the literature.

  4. 4.

    Appraise – Critically appraising the information for its validity and usefulness.

  5. 5.

    Apply – Applying the new knowledge in the clinical management of patients.

Fig. 2

The 5As cycle in EBM practice

Asking an Answerable Clinical Question in the “PICO” Format

The ability to formulate a clinical question in the “PICO” (patient, intervention, comparison, outcomes) format is a fundamental skill in the EMB process (Robson 2016). The PICO question is an answerable question focused on a clinical problem, formulating around the specific problem identified from the patient history and clinical examination. Depending on the context, this can be a question regarding the outcome difference between two contrasting scenarios, as defined, for instance, by different treatment approaches, exposure to different risk factors, or having different prognostic factors. Generally a PICO question is defined by four basic components, including the patient, the intervention, the comparison, and the outcome, as detailed in Table 1.
Table 1

The four components of a PICO question



P = patient or problem

This part describes salient and important characteristics of the patient that may be important in defining the current problem. This may include the primary problem, disease and coexisting conditions, and sex, age, and race of a patient that might be relevant to the diagnosis or treatment of a disease

I = intervention/indicator (e.g., certain exposures or prognostic factors)

This includes the main factors which are being considered, which may be some intervention, prognostic factor, or exposure as defined by the context of the problem

C = comparison/control

This refers to the main alternative to compare with the intervention, which may be an intervention, exposure, or some prognostic factors. Some clinical question may not need a specific comparison group

O = outcomes

The main outcome measure of current interest, which may be a treatment or disease outcome

For example, when considering what may be the best operative approach for necrotizing enterocolitis in children, one possible PICO question may be like this: “For children suffering from necrotizing enterocolitis with perforation (the patient), does laparotomy (the intervention), when compared with peritoneal drainage (the comparison), confer any additional survival benefit at 90 days after the surgery (the outcome)” (Moss et al. 2001).

Searching for and Selecting the Best Available Evidence for a PICO Question

Searching for the best evidence to answer a specific clinical question is not a pure numbers game. In contrast to a broad and very inclusive search, the aim here is to nail down and obtain the best evidence that is specific to a particular clinical question.

One of the commonest tools for EBM literature search is the PubMed/MEDLINE database ( A basic search can begin with the use of Boolean operators (e.g., AND/OR, etc.) to link up key elements in the PICO question as the query search terms. The use of MeSH terms (Medical Subject Headings) can help to increase precision of the search result by running the search in the MeSH Database. Search results can be saved, emailed, or exported into common citation management software program such as EndNote, Reference Manager, and ProCite.

A search on any clinical problem, even targeted to be very specific, can easily generate hundreds of returns, and one has to read through the title, the abstract, and sometimes the entire article to select the best potential evidence for answering the specific clinical question. A quick assessment of the objectives and rationale of the study is needed to make sure this is in line with the PICO question to avoid wasting time and effort on something irrelevant. During this selection process, preference should generally be given to newer reports and those being ranked higher in the evidence hierarchy (i.e., RCTs or meta-analysis), so as to get the most updated and valid evidence as far as possible. Clinical queries is another feature where evidence-based literature can be quickly located, by searching specifically for “systematic review” or “meta-analysis” or specifying the “therapy” category and search by “clinical study category” for RCTs. The Cochrane Library, available at, is a collection of six databases where reports of systematic reviews for a large variety of clinical problems can be found.

Critical Appraisal of the Identified Evidence

Once the best potential evidence for a PICO question is identified, the manuscript needs to be critically evaluated with a systematic framework. The main aim is to assess the quality of the presented evidence in terms of its validity, its potential clinical significance, and its applicability to the particular patient. Figure 3 shows a flowchart of the key steps in a critical appraisal .
Fig. 3

Key steps in the critical appraisal process in the practice of EBM

Interpreting the Internal Validity of a Study Result by Assessing the Role of Chance, Bias, and Confounding

Internal validity of a study refers to whether it is designed and conducted to measure what it is intended to measure and to produce a “truthful” result. Consideration of any study finding will only be meaningful after the study validity can be reasonably established, as the result from an invalid study could hardly be trusted and based upon for making a clinical decision. Rather than being explicitly stated somewhere in the report, an assessment of the internal validity can only be done by a detailed examination of the study methodology and analysis approach, to specifically look out for hint or evidence that the study result may be affected by bias, confounding, or chance. While chance findings are caused by random variation, bias and confounders represent systematic errors that may jeopardize the internal validity of a study by leading to false conclusions regarding the true relationship between the intervention and outcome (e.g., false-positive or false-negative conclusion regarding the performance of a new surgical approach comparing with a traditional approach).

Look Out for Possibility of Bias in Relation to Different Study Designs

Bias refers to systematic error caused by the design or conduct of a study that may affect a true inference on the relationship between the intervention and outcome. The two common categories of bias include selection bias, which relates to how study subjects are selected, and information bias. This is a result of error in data measurement or variable assessment.

As mentioned previously, observational study designs are generally more vulnerable to the problem of bias, with different study designs more prone to their specific and inherent types of bias. As rigorous attention to study design and conduction is required for eliminating or minimizing their potential impact on study validity, evidence for these elements need to be specifically examined during the appraisal to see how good they have been done to prevent potential bias (see Table 4 in the following section).

Randomized controlled trials and systemic reviews represent the preferred source of valid evidence in EBM. Their appraisal will be considered in greater details here. Major methodological elements that may affect the quality of a RCT include randomization, allocation concealment, blinding, and follow-up of patients. During the appraisal, questions that need to be asked include:
  1. 1.

    Was the randomization done properly by an appropriate method?

  2. 2.

    Was allocation concealment being performed properly and effectively?

  3. 3.

    Was the study intended to involve blinding? And if this was the case, who were those the study was trying to blind, and how successful was the blinding?

    As previously explained, the process of randomization, together with allocation concealment and blinding, can help balance out any known or unknown confounding to ensure the two comparison groups are comparable to start with and to avoid bias to be introduced during the whole study.

  4. 4.

    Were the study groups really comparable to start with?

    This can be reflected by an examination of the baseline characteristics of both groups, which is usually displayed in a table in the manuscript. This gives a good indication on how successful the randomization and allocation concealment process are.

  5. 5.

    How complete was the follow-up?

    Ideally all participants in each group should complete the whole study, and good studies should be able to follow up at least 80% of their patients for the primary outcome of interest. Dropping out of patients in general may affect the precision of the study. Differential lost-to-follow-up in different comparison groups may lead to an overestimation of the effectiveness of the intervention and invalidate the study conclusion. Sensitivity test with the worst-case outcome being assumed for those missing patients will help to give a more conservative approach to the analysis.

  6. 6.

    Were patients analyzed by intention-to-treat analysis?

    This refers to the approach where patients were analyzed according to the groups to which they were randomized to, regardless of whether they received or adhered to the allocated intervention. This approach helps to preserve the effect of randomization from the problems of non-compliance, crossover, and dropout and give a pragmatic and more conservative estimate of the benefit of a treatment option in real life rather than the potential benefit in patients who receive treatment exactly as planned.


For systemic review, the main issue to assess is whether all potentially relevant literatures were identified and considered, whether they were selected and included in a systemic and unbiased manner to avoid the risk of selection bias and whether these original studies were themselves of reasonable quality. With meta-analysis, comparability of these studies in terms of study design and settings, inclusion and exclusion criteria, and end-point measures all need to be assessed.

Ideally the review should be focused on a specific clinical question relevant to the PICO question rather than being a very broad review which is less likely to provide a useful answer to the clinical problem. Completeness of the search process can be assessed by details regarding the search strategies and covered databases. This should preferably also include consideration of cited references at the end of articles and other non-published sources. The selection and assessment process should be checked for any hint of selection bias. These steps need to be guided by reasonable and objective criteria and to be done by more than one researcher. An assessment of methodological quality and validity of selected primary studies also needs to be included, as this will also have a direct impact in the validity of the review finding.

Have Potential Confounding Factors been Addressed?

Sometimes being referred to as the third major class of bias, confounding are any variables that may be associated with the outcome (e.g., death or recovery from a disease) but themselves not being a part of the causal pathway between the intervention and the outcome. Uneven distribution of confounding in comparison groups, especially common in observational studies, may cause spurious association between the intervention and outcome of interest and invalidate the result in a study. Confounding can be controlled in the study design level by randomization, restriction of study entry to individuals with certain confounding factors, or matching of individuals or groups to equalize distribution of confounders. In the analysis level, they may be controlled by stratification analysis according to confounding or statistical adjustment by multivariate analysis. A well-performed randomization in RCT is by far the best approach to eliminate the problem from both known and unknown confounding by balancing out their occurrence between study groups.

Examining the Role of Chance and Statistical Significance of the Result

Any reported difference between the comparison groups, such as a lower complication rate with a new versus a traditional operative approach, may reflect a true improvement or arise just by pure chance. Based on a study sample to draw a conclusion about the population, a clinical study may commit an error either by finding an effect or a relationship when there is in fact none (a false-positive result, or a Type I error, α ) or failed to demonstrate the effect or relationship when there is a real one (a false-negative result, or a Type II error, β ), just because of random variation or pure chance. The contribution of random variation to a reported study result can be assessed by its statistical significance , which is essentially a process of hypothesis testing regarding the compatibility of the observed results with the null hypothesis (which states that there is no real difference between the comparison groups). This is usually reported with either a p-value or a confidence interval. A p-value  ≤ 0.05 (5%) has been customarily chosen as an indication for a strong argument against chance and being referred to as a statistically significant result. Technically this translates into a chance of less than 1 in 20 in committing a Type I error by incorrectly rejecting a true null hypothesis and concluding that a real effect of relationship existed. On the other hand, a p-value larger than 0.05 means that the role of chance cannot be confidently excluded, at least with the current power of the study as determined by its sample size. Many journals now prefer the reporting of confidence interval rather than the p-value, which gives an estimated range of values that is likely to include the true effect size with a specified probability level. A 95% confidence interval not including the effect of null (0 for absolute difference or 1 for relative difference) is equivalent to a p-value < 0.05 and reflects statistical significance. The range of the interval can also convey a rough idea about the sample size, as a larger sample size will generally give a more precise estimation with a tighter confidence interval , while a wide interval may reflect an inadequate sample with a poor power.

Statistical significance should, however, be interpreted with flexibility and consideration of the context rather than being used as a rigid cutoff. For instance, a very large reported effect size with a confidence interval marginally including the effect of null, such as an odds ratio of 14.3 (95% CI 0.96–28), or a mean increase in surgical complication rate of 85% (95%CI 0.32%–154%), should reflect the need of a more powerful study with a larger sample size to properly demonstrate a potentially important finding rather than being simply disregarded as statistically insignificant.

Examining the Clinical Significance of the Result

As a statistical property, a very small difference may still attain statistical significance when the sample size is large enough. This should not be confused with its clinical significance, which refers to the practical value of such a difference in clinical practice (e.g., a 0.2% difference in survival rate with a new surgical approach in comparison with an existing one). To a considerable degree, what constitutes a minimal clinically important difference (MCID) involves a subjective judgment made by the clinician and needs to be balanced with any potential harm and determined with the context in different scenario. For instance, how much improvement in function or reduction in complication rate is good enough to justify the new treatment approach? How much additional survival would be considered worthwhile (weeks/months/years) in the face of a new treatment?

Causality Assessment of the Result by Bradford Hill’s Criteria

In the setting of a surgical clinical trial, especially with a RCT design, the causal relationship between the intervention (e.g., a new surgical procedure) and the outcome (e.g., a reduction in mortality or complication) is usually pretty straightforward and not too difficult to be established. An inference regarding whether the result reflects only an association or points toward a causality may be more difficult to establish in the setting of an epidemiological study (e.g., to look for risk factors for a surgical disease), especially those adopting an observational and nonexperimental approach. A common approach as proposed by Sir Bradford Hill (1965) involves a consideration of the criteria as listed in Table 3. Generally, a stronger argument for causality can be established with more criteria. Some of the criteria, such as biological plausibility, may not always be available even for a really novel finding as limited by the state of current knowledge. Many authorities have considered that the presence of an appropriate temporality with the occurrence of the exposure prior to the outcome may be one of the most essential criteria to be established in most scenarios (Table 2).
Table 2

Bradford Hill’s criteria for causality assessment




Although a small association does not preclude a causal effect, causality is considered more likely with a larger reported association/effect size

Dose response

Also called “biological gradient,” meaning that a greater level of exposure leading to greater incidence of the outcome


Occurrence of the cause (the exposure/intervention) before the outcome


The observation of consistent findings by different studies in different places with different samples

Biological plausibility

The availability of a plausible mechanism to explain the observed relationship between cause and effect


Coherence between all studies adopting different study approaches, e.g., epidemiological/laboratory/and experimental studies


The specific association between a factor and a particular outcome but not other outcomes, but may not always be the case

Is the Conclusion Applicable to the Patient in the Clinical Question?

After establishing the internal validity of a study, the applicability of its result to the clinical problem in question also needs to be considered. As clinical studies are usually performed on a small group of sample patients defined by very specific inclusion and exclusion criteria, the results may not always be generalizable to all patients encountered in the clinical setting. External validity , or generalizability , refers to the extent to which the results of a study can be applied to other situations and to other people in the wider population. An assessment of this external validity requires a careful examination of the setting where this study was conducted and details of the initial inclusion and exclusion criteria for subject enrollment. The question to ask is whether these criteria, such as ethnicity, disease stages, or comorbidity, may have in some way restricted the relevance of the result to other patients encountered in clinical practice. For the particular patient in the PICO question, this can be assessed by examining the likelihood for this particular patient to be enrolled as a subject in this study if he/she had presented himself/herself in the recruitment setting.

Clinical Research in Pediatric Surgery

Clinical research is defined by the National Institutes of Health as research conducted with human subjects and includes patient-oriented research, epidemiologic and behavioral studies, outcomes research, and health services research. It seeks to extend potentially promising basic scientific understanding from laboratory studies using animal or cell lines into research involving human subjects through different phases according to established protocols. One main aim of clinical research in pediatric surgery is either to validate existing clinical practices or to investigate new treatment approaches in an evidence-based manner. While a detailed discussion of the theory and conduction of clinical research would constitute a whole book in itself and is beyond the scope of this chapter, some important issues for consideration during the planning of clinical research in the setting of pediatric surgery are presented here.

Constructing a Research Question and Designing the Study

The ability to ask a proper research question is the first important step for ensuring a successful research. A good research question should be centered around an issue the researcher is genuinely curious about and which represents an important gap in the current knowledge of the area. This research gap needs to be identified and properly shaped by an exploratory approach from current literature on the field, to identify the current state of knowledge, controversies, and important unanswered questions, and areas as yet unexplored surrounding the particular problem.

Similar to the PICO question, the research question should be constructed in a clear, focused, concise, and arguable format. Using the same example quoted previously for the PICO question, clear identification of the target intervention under investigation (laparotomy), the intervention/placebo to be compared with (peritoneal drainage), the target patient population in question (children suffering from necrotizing enterocolitis with perforation), and the outcome of interest (survival benefit at 90 days) would help to clearly highlight the objective of the study (Moss, et al. 2001). Besides helping the researcher to choose a suitable study design and plan for the appropriate logistics, a well-developed research question can also help him to focus on the central theme of the specific argument and avoid an “all-about” study.

Planning the Logistics for a Clinical Research

The choice regarding which may be the most suitable study design depends on a number of factors including the specific aim of the study and other practical constraints such as data availability and funding. While observational design is commonly adopted for epidemiological and behavioral studies, they are less commonly used in a surgical setting. As the evaluation of new therapeutic approach represents a common aim for clinical research in a surgical setting, RCT would naturally be the best experimental approach for the purpose. Although clinical trials on pharmacological product underdevelopment are typically conducted in four sequential “phases” with each serving a different and specific purpose, a phase 3 trial using a proper RCT design is actually the only common stage encountered in a surgical setting. Justification for conducting a RCT is usually available in the form of early successful case series in the case of a novel operative approach.

Many of the important components of RCT for ensuring the study validity may be logistically difficult if not impossible in a surgical setting and present problems not seen in pharmaceutical trials. While randomizing patients to receive either a new operative approach or existing one for a particular disease may still be plausible, the comparison of a novel operative approach to placebo may be ethically difficult to justify as the new approach may be the only existing hope for the patient even still being an experimental procedure lacking solid evidentiary support. The existence of preconceived bias among either patients or investigators is recognized to be a major hurdle for the conduction of a RCT that examines a surgical technique.

While it would never be possible to blind the surgeon, blinding of the patient using a sham operation as the placebo arm is theoretically possible but rarely ethically justifiable due to the invasiveness and potential harm of the procedure. In some situation where it may also be logistically or ethically difficult to blind the patient, blinding of the data collector and interpreter would still be a valuable option to minimize potential bias. For instance, very simple method like putting an identical set of postoperative dressings on the possible wound sites had been used in previous RCTs to blind the postoperative observers to the type of operation that the patients had received (Chan et al. 2005). Likewise, even though blinding may be difficult or imperfect, it is always advisable to conceal the randomization sequence to colleagues responsible for patient enrollment to minimize the potential impact of selection bias.

Another common problem in a surgical setting, especially for less prevalent conditions, is the scarcity of sample size to attain sufficient power for demonstrating an outcome difference of small or even moderate size. Collaborative multicenter trial conducted at more than one medical center or clinic may be a possible approach to include a larger number of participants within a shorter study period. This may also enhance the generalizability of the findings as a wider range of population groups from different geographic locations are included. For efficacy outcome that may vary significantly between population groups as a result of different genetic, environmental, or cultural backgrounds, a really large sample size from a geographically dispersed trial would normally be needed. However, quality assurance to ensure proper conduction with strict protocol compliance for processes like admission, treatment, and follow-up in multicenter trials may not be a trivial challenge and generally requires an experienced and highly developed coordinating center.

Data Acquisition and Analysis

In a clinical research setting, the aim generally is to make comparison between groups on some relevant clinical characteristics and outcomes, which may either be measured by some continuous variables that can be summarized by the means or some nominal or ordinal variables that can be measured by proportions. While a detailed account of statistical analysis methods is beyond the scope of this chapter, a suitable statistical test for different research questions can generally be selected according to the type of data in question, that is, whether we are dealing with some categorical, ordinal, or continuous data, respectively, for the explanatory and outcome variable, as outlined in Table 3.
Table 3

Selection of statistical tests for different data types

Explanatory/predictor variable

Outcome/response variable

Categorical (two categories)


2 categories

Chi-squared test

t-test, Mann-Whitney U test

≥3 categories

Fisher’s exact test, McNemar’s test

ANOVA, Kruskal-Wallis test


Logistic regression



Sample Size Estimation

Besides properly designed and conducted, a study needs to have a sufficient sample size to attain an adequate statistical power, which is the probability to detect a significant difference between the comparison groups when there truly is one. When no statistically significant difference is detected between two comparison groups in a study, there may either be truly no difference (true negative) or there in fact is a difference but the study is not able to detect it (false negative). A conclusion can only be drawn in this situation when the study is of sufficient power to detect such a difference with statistical significance.

One of the commonest questions encountered by most investigators is how to determine the power and thus the required sample size. Consultation with a statistician in an early phase of the study planning would be tremendously helpful and is always advisable.

Generally, the power of a study will be affected by five common parameters, including the effect size, estimated measurement variability, desired statistical power, significance criterion, and the planned analysis approach (a one- or two-tailed statistical analysis). When estimating the sample size, the investigator needs to decide on the first two parameters. Then the power and the significance criterion are commonly being set customarily at 80% and 0.05, respectively.

Effect size – This refers to the smallest difference in outcome between comparison groups that the study is intended to detect. While a moderate sample size may be good enough for detecting a sizable difference, an increasing sample size is generally required to detect a smaller difference. Results from a pilot study or a relevant previous literature may give some guidance regarding the effect size to be expected. Clinical experience and judgment may also be helpful in deciding what should be the minimal amount of improvement to be achievable by the new intervention for it to be justified (e.g., 10% improvement in survival or 20% reduction in complication rate).

Measurement variability – This refers to the estimated variability in the outcome parameter within each comparison group and is commonly expressed as a measurement of dispersion by the standard deviation (SD). Generally a large sample size would be required for demonstrating a certain effect size for a measurement with a larger degree of variability. Again data from some pilot or previous study from a similar study population may be useful for its estimation. When no such data is available, a range of values may be assumed basing on subjective experience to assess its likely impact on the estimated sample size. For an outcome expressed as a proportion, the SD can be statistically estimated.

Statistical power (1-β) – This refers to the probability of the study for detecting the effect size as specified if there truly is a difference between the comparison groups. While a higher power is always desirable, the obvious trade-off is the resource implications associated with the larger sample size that is needed. In a RCT , a power of at least 0.80 is customarily required, meaning a < 20% chance of committing a Type II error and failing to demonstrate a true effect or relationship.

Significance criterion (α) – As the chosen cutoff for p-value associated with a result to be considered statistically significant, a more stringent criterion (small p-value) would require a larger sample size. It is customarily set at 0.05 (5%) to give a reasonably strong argument against chance (i.e., a chance of 1 in 20 for committing a Type I error).

One- or two-tailed statistical analysis – While a two-tailed analysis tests generally for a difference in the result between the comparison groups, a one-tailed analysis tests specifically for a difference in only one specific direction (being larger/smaller). When the scenario is truly appropriate, the use of a one-tailed statistical analysis would generally require a smaller sample size than a two-tailed analysis, as the sample size giving a significance criterion of α in the former will generally give one of 2α in the latter design.

Once the five issues have been decided, these can be used in estimating the sample size required. Online tools are commonly available for researchers to quickly estimate the sample size they may need in common situations involving comparison of means, proportions, or other parameters with an assumed parametric distribution. One user-friendly example is the Java Applets for Power and Sample Size by Lenth, R. V. retrievable from, where the simple calculator “piface.jar” may also be downloaded to run in a local computer with the Java Runtime Environment (JRE) installed.

Additional allowance should be given to the number of likely attrition by loss to follow up after enrollment, to make sure the estimated power can still be attained by the final sample size after some dropping out. It is also important to understand that the use of intention to treat in the analysis may also cause some reduction in the estimated power by counting subjects with suboptimal compliance in their assigned treatment group. As it is not uncommon for researchers to overestimate their ability to recruit, it is thus important to be realistic in the planning of the study and have all potential logistical hurdles properly considered.

Preparing for a Grant Application

Different forms of funding opportunities are available to support scientific research studies in different settings. Common sources include governments, NGOs, scientific and educational organizations, business sectors, or charity funds. As they generally vary widely in their funding amount and target research projects, the importance of an initial examination of the aim and scope of the funding can never be overemphasized in order to ensure aiming the nontrivial effort at the right target. Specific eligibility criteria for different funding sources also need to be met before a funding application will ever be considered. Compliance to procedural, scheduling, and formatting requirements, although sometimes being highly tedious, is also effort well spent to avoid the greater effort of putting up the whole application being wasted.

Although the details required by different application varies, funding sources generally try to fund proposals that are targeting to solve an important scientific question by employing a methodologically valid and ethically justified study method and to be conducted by a group with suitable track record to ensure the delivery. To enhance the chance of success, the preparation of a grant application therefore should not be regarded as a passive process of information provision regarding details of the study and the investigator, but to be taken actively as a platform to explicitly highlight the above four important issues and impress the reviewers.

Firstly, a succinct summary of the current level of knowledge regarding the specific area of study should be presented as the background. This needs to be centered around the specific question to be investigated, with the aim to have the research gap properly highlighted, so as to justify the conduction of the current study. The precise objectives need to be clearly stated, preferably in the form of a testable hypothesis.

A description of the study plan and methods is usually needed. The details to be included here depend on the study design, but the main aim is to show the reviewers that due consideration and attention is being paid to minimize the potential impact of any possible bias and confounding on the study validity. For instance, for a RCT proposal, it would not be good enough to just mention that participants will be randomized and blinded, but details on these procedures including how will the randomization be done, how will the allocation sequence be concealed, how to ensure blinding, etc. would be needed to properly demonstrate the study validity.

In order to demonstrate that the study is ethically justifiable, a detailed assessment of the potential benefits and harms or risks, including any physical, psychological, social, or legal risks, to the participants or the study workers, should be included. A justification on why these risks to the participants is reasonable in relation to the anticipated benefits of the study findings that can be reasonably expected should also be given; this is especially important if the study involves people who are particularly likely to be vulnerable by participating in a study, such as fetuses and children, pregnant women, cognitively impaired individuals, or prisoners. Details of any procedures to monitor, minimize, or manage any potential harm or protect against these risks should also be given. A proper sample size estimation should always be included as it is both ethically and financially unjustifiable to conduct a study with too few patients and thus underpowered to detect what it is supposed to look for or with too many patients and thus unnecessarily exposing more patients than what is required to the risk and inconvenience of the study.

Details regarding the experience of investigators and availability of appropriate resources, infrastructure, equipment, and collaborating partners for the proposed study should then be provided to show that the investigator and his/her team are the best candidates for conducting this study and thus for the funding to be entrusted upon.

A number of practical constraints such as data availability and funding may affect decisions regarding the study design and logistics. Once these are decided, the investigator should come up with a reasonable budget to convince the reviewers that any single dollar being asked is justified and will be sensibly and carefully spent. Having said that, it is generally not advisable to opt for a less valid approach just for the sake of budget cutting, as most grant reviewers would probably prefer to fund an expensive but well-designed RCT that can give a definitive answer to an important question rather than funding another cheaper study compromised on design and validity.

Communicating the Research Findings in an Evidence-Based Format

Scientific paper is the main channel for research findings to be communicated effectively to healthcare professionals. While the exact details need to be reported varies with different types of study designs and the targeted journal, generally three basic issues always need to be properly addressed: Why is the research important? How was the research carried out? And what was found?

Structurally four basic parts are always needed: introduction, methods, results, and discussion, with each carrying an important aim that has to be explicitly highlighted. Introductions should be short and clearly tell the reader why this study has been undertaken. A background literature review should tell what is already known in this subject area, with a problem statement to highlight the research gap as what else is still unknown and need to be learnt and with a purpose statement to clearly explain what is intended to achieve in the current study.

The methods section should contain enough information regarding how the study was designed and carried out and how the data was collected and analyzed. Enough details should be given on original methods or proper references should be included. Comparison groups and outcomes should be clearly described and defined. Relevant special steps taken to minimize potential impacts of risk and confounding should be highlighted. Sample size estimation and statistical approaches should be suitably presented to justify the power of the planned study. The main aim is to describe what has been done, and how was it being done, (but not what was found) in a logical and chronological manner so as to convince readers about the validity of the study result.

The results of data analysis should then be clearly presented. Tables, figures, and illustrations can be used where appropriate to present the detailed results and to establish the statistical validity of the conclusions. Whenever being used they should be equipped with appropriate descriptive headings, legends, and with abbreviations and symbols clearly defined for each to be capable of standing alone. The main aim of this section is to highlight and guide the readers into important points and thoughts that are relevant to the study purpose.

The discussion section should include an interpretation of the results and stating the main conclusions of the study. Comparing the findings with previous similar work will help to highlight the specific contribution achieved and how our scientific understanding has progressed as a result of findings from this study. A brief appraisal of the potential limitation and remaining gaps with a discussion of potential areas for improvement or for future researches should also be included.

A number of reporting guidelines have been developed by consensus opinion of experts to facilitate proper reporting of research findings, including one on uniformed requirement in general and a number of others for reporting studies employing different common study designs (Table 4). Most medical journals nowadays require authors to comply with these established reporting guidelines for manuscript submission.
Table 4

Common guidelines for health research reporting


Uniform Requirements for Manuscripts Submitted to Biomedical Journals

Prepared by the International Committee of Medical Journal Editors (ICMJE)


Consolidated Standards of Reporting Trials


Standards for Reporting Studies of Diagnostic Accuracy

Bossuyt et al. (2003). Clin Chem. 49(1):7–18


Strengthening the Reporting of Observational Studies in Epidemiology


Preferred Reporting Items for Systematic Reviews and Meta-Analyses


Meta-analysis of Observational Studies in Epidemiology

Stroup DF et al. (2000). JAMA 283(15):2008–2012

Besides the reporting of all relevant details as guided by these guidelines, the need for key study findings to be reported succinctly to reach and update the busy clinicians has also increasingly been recognized. One example is the use of a one-page abridged format in the print journal of BMJ (called BMJ pico), for research papers to be published in an evidence-based manner to make research results more inviting and useful to readers (BMJ 2008). By emphasizing the clear and succinate reporting of the four main components of a study (the patient/problem, intervention, comparison, and outcomes), this approach has once again highlighted the importance of framing and understanding a research problem in the PICO format for the practice of EBM, the conduction of clinical research, and the reporting of research results.

Conclusion and Future Directions

As for any other clinical specialties, there is little doubt that clinical research and evidence-based practice are the two most important elements for the continuous advancement of the specialty of pediatric surgery. The generation of the best scientific evidence by properly conducted clinical research and the utilization of this evidence to inform the best clinical practice is reciprocally interlinked and mutually dependent. The four PICO components of a study (the patient/problem, intervention, comparison, and outcomes) constituted the core considerations which need to be properly addressed in all sequential activities in the cycle, including when designing a good RCT, identifying the most appropriate research evidence in EBM, succinctly reporting the research finding, and drafting an effective grant application.

Given that RCTs for surgical problems are much more difficult to design and implement and may not always be feasible, consideration and effort should also be paid to improve the quality of nonrandomized comparative studies and other studies, both in terms of their internal and external validity. While Class I evidence from well-conducted prospective RCTs may not be available for many surgical conditions, it is important to reiterate that the practice of EBM in pediatric surgery must take into consideration information from all three sources, including evidence from clinical research, clinical expertise and skills of individual surgeon, and the patient’s and society’s choice.



  1. Bossuyt PM, Reitsma JB, Bruns DE Gatsonis PP, Glasziou PP, Irwig LM, Moher D, Rennie D, De Vet HC, Lijmer JG. Standards for reporting of diagnostic accuracy. The STARD statement for reporting of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003;49(1):7–18.CrossRefPubMedGoogle Scholar
  2. Chan KL, Hui WC, Tam PKH. Prospective, randomized, single-center, single-blind comparison of laparoscopic vs open repair of pediatric inguinal hernia. Surg Endosc. 2005;19:927–32. doi:10.1007/s00464-004-8224-3.CrossRefPubMedGoogle Scholar
  3. Cochrane AL. Effectiveness and efficiency: random reflections on health services. London: Nuffield Provincial Hospitals Trust; 1972. Reprinted in 1989 in association with the BMJ. Reprinted in 1999 for Nuffield Trust by the Royal Society of Medicine Press, London, ISBN 1-85315-394-XGoogle Scholar
  4. Hill AB., The environment and disease: association or causation? Proc R Soc Med. 1965;58:295–300.PubMedPubMedCentralGoogle Scholar
  5. Lenth RV. Some practical guidelines for effective sample size determination. Am Stat. 2001;55:187–93.CrossRefGoogle Scholar
  6. Moss RL, Dimmitt RA, Henry MC, Geraghty N, Efron B. A meta-analysis of peritoneal drainage versus laparotomy for perforated necrotizing enterocolitis. J Pediatr Surg. 2001;36(8):1210–3.CrossRefPubMedGoogle Scholar
  7. New format for BMJ research articles in print. BMJ. 2008;337:a2946.Google Scholar
  8. Robson B. Studies in using a universal exchange and inference language for evidence based medicine. Semi-automated learning and reasoning for PICO methodology, systematic review, and environmental epidemiology. Comput Biol Med. 2016;79:299–323.CrossRefPubMedGoogle Scholar
  9. Sackett DL, Rosenberg WM, Gray JA, et al. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312(7023):71–2.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Stroup DF, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA. 2000;283(15):2008–12.CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2016

Authors and Affiliations

  • Dennis K. M. Ip
    • 1
    Email author
  • Kenneth KY Wong
    • 2
  • Paul Kwong Hang Tam
    • 3
  1. 1.School of Public HealthThe University of Hong KongHong KongChina
  2. 2.Department of SurgeryThe University of Hong KongHong KongChina
  3. 3.Department of Surgery, Li Ka Shing Faculty of MedicineThe University of Hong KongHong KongChina

Personalised recommendations