REVIEW ARTICLE Year : 2012  Volume : 23  Issue : 5  Page : 660664 Concepts in sample size determination Umadevi K Rao Department of Oral and Maxillofacial Pathology, Ragas Dental College and Hospital, 2/102 East Coast Road, Uthandi, Chennai, India Correspondence Address: Investigators involved in clinical, epidemiological or translational research, have the drive to publish their results so that they can extrapolate their findings to the population. This begins with the preliminary step of deciding the topic to be studied, the subjects and the type of study design. In this context, the researcher must determine how many subjects would be required for the proposed study. Thus, the number of individuals to be included in the study, i.e., the sample size is an important consideration in the design of many clinical studies. The sample size determination should be based on the difference in the outcome between the two groups studied as in an analytical study, as well as on the accepted p value for statistical significance and the required statistical power to test a hypothesis. The accepted risk of type I error or alpha value, which by convention is set at the 0.05 level in biomedical research defines the cutoff point at which the p value obtained in the study is judged as significant or not. The power in clinical research is the likelihood of finding a statistically significant result when it exists and is typically set to >80%. This is necessary since the most rigorously executed studies may fail to answer the research question if the sample size is too small. Alternatively, a study with too large a sample size will be difficult and will result in waste of time and resources. Thus, the goal of sample size planning is to estimate an appropriate number of subjects for a given study design. This article describes the concepts in estimating the sample size.
Importance of Sample Size Let us consider the following study titled "Bonded versus banded first molar attachments: A randomized controlled clinical trial" This study was published in Journal of Orthodontics, 2007. [2] The purpose of this study was to compare the clinical failure rates of bonded molar tubes with those of cemented bands during fixed appliance therapy. The study concluded that the first molar tubes bonded with relyA bond composite showed a significantly higher (33.7%) firsttime failure rate than bands cemented with intact Glass Ionomer Cement (GIC) (18.8%) and the difference was nearly 15%. The main finding in this study was that the failure rate of bonded molar tubes was significantly higher, almost twice that seen for bands and the survival time of the bonded tubes was almost half that of the bands. The above study is an example of an analytical or a comparative research study where the proportions of some characteristic in two or more comparison groups are measured. Alternatively, in a research design the variable of interest, which is, the parameter to be studied, can also be a comparison of means in two or more groups. The authors have done the study with an aim of demonstrating a minimum of 15% difference in firstmolar attachment failure rates between bonded molar tube and cemented molar bands. They chose the difference between the two groups as 15% because they were of the opinion that this would be a meaningful difference in the clinical scenario. If the investigators failed to have the adequate sample size, then they could not have demonstrated this 15% difference by statistical analysis. Sample size estimation gives information regarding the feasibility of the research design and the scope of the variables that can be included. It is always recommended that it should be estimated early in the design phase of a study when major changes are still possible. Apart from being necessary to carry out a meaningful study, sample size determination for research projects is an essential part of a study protocol for submission to ethical committees, research funding bodies and peer reviewed journals. [3] Research question All studies should start with a research question that addresses what the investigator would like to know. The research question is the objective of the study, the uncertainty that the investigator wants to resolve and often begins with a general concern that must be narrowed down to a concrete, researchable issue. Good research questions can arise from published medical literature where gaps in knowledge are often highlighted. They may also come from applying new concepts or methods to old issues and from ideas that emerge from teaching. Choosing the study subjects In clinical research, it is often impossible to study the entire population of interest, a subset referred to as a sample is often used. There are many types of sampling methods but in a true probability sampling, every member of the population has equal probability of being included in the sample. Sampling enables the researcher to draw inferences about a large population by studying and examining a sample at an affordable cost, time and effort. It is essential, that the researcher must conceptualize the target population as an initial step towards designing a study. This can be achieved by formulating a specific set of inclusion criteria (that establish the demographic and clinical characteristics of subjects which are apt and answer the research question) and exclusion criteria (that eliminates the subjects who are not appropriate for the study). Variables A variable is the characteristic that varies from one study subject to another. While designing a study, it is important to decide which variable will be chosen for the study. The validity of a study depends on how well the variables designed for the study represent the phenomenon of interest. For instance, how well does a fasting blood sugar level or salivary sugar level represent the control of diabetes? Does unstimulated salivary flow rate define xerostomia, or does the extent of mouth opening help in the clinical staging of Oral submucous fibrosis? The two types of variables are continuous variables and categorical variables. Continuous variables have quantified intervals on an infinite scale of values. eg: salivary flow rate, age. A scale that has a finite number of intervals is termed discrete. eg: number of cigarettes smoked per day and parity. Discrete variables that are ordered and (i.e., arranged in sequence from few to many) that have a considerable large number of possible values resemble continuous variables for practical purposes of measurement and analysis [4] Categorical variables are referred to those variables that are not suitable for quantification and are often measured by classifying them in categories. Categorical variables with two possible values (e.g. dead or alive) are termed dichotomous or binary. Categorical variables with more than two categories can be either nominal variables (have categories that are not ordered, eg; blood groups: type A blood is neither more nor less than type B or O) or ordinal variables (have categories that do have an intrinsic order, for example severe, moderate and mild dysplasia). In considering the association between two variables, the one that precedes (or is presumed on biologic grounds to be antecedent) is called the predictor or independent variable; the other is called the outcome variable or dependent or response variable. eg: if a study is designed to determine the efficacy of probiotics in reducing the oral candida, the predictor variable is probiotics and the outcome variable is oral candida. Study design The important step while designing research is to take a decision, whether the researcher is going to take a passive role in observing the events taking place in the study subjects, as in an observational study [Table 1] or to apply an intervention to the study subjects and examine its effects as in an experimental study, eg: clinical trials.{Table 1} P value In any research study, the data thus collected, is analyzed by using appropriate statistical tests. Such tests determine the p value. The p value is the probability of obtaining a test statistic as large as or larger than obtained in the study by chance if the null hypothesis is true. The null hypothesis, stated at the beginning of the study is rejected in favor of its alternative if the obtained P value is less than α, which is the predetermined level of statistical significance. If the obtained P value is greater than α, the null hypothesis is accepted. Parameters that Determine the Sample Size Hypothesis The research hypothesis summarizes the elements of the study: the sample, the sample size, the design, the predictor and the outcome variables. The primary necessity of stating the hypothesis is to establish the basis for tests of statistical significance. Hypothesis is essential for comparative studies like our example stated above i.e. studying which type of molar bonding clinically has lower failure rates. Hypothesis can be either null or alternative. [5] In our example, the null hypothesis would be that there is no difference in the longterm failure rates between bonded tube versus cemented first molar attachments and the alternative hypothesis would be that there is a difference in the clinical failure rate. Alternative hypothesis cannot be tested directly; it is accepted by default if the test of statistical significance rejects the null hypothesis. Alternative hypothesis can be one sided or two sided. An one sided alternative hypothesis insists on the direction of the effect (bonded tube first molar attachments will have a higher long term failure rate) whereas a two sided alternative states that there will be a difference which can go in any direction. When two sided statistical tests are used, the P value includes the probabilities of committing a type Ι error in each of the 2 directions, which is about twice as great as in 1 direction only (a one sided P value of 0.05 is usually the same as a two sided P value of 0.1). Smaller sample sizes are required to test a onesided hypothesis and power is lost if you change to a two sided hypothesis. After stating the hypothesis, we proceed with the study and perform the appropriate statistical test and based upon the level of statistical significance α, which is decided prior to the study we either accept the null hypothesis or reject the null hypothesis and accept the alternative hypothesis based on the significance of the statistical test. Effect size: Minimum expected difference The likelihood that a welldesigned study will be able to detect an association between a predictor and outcome variable depends on the actual magnitude of the association in the target population. If it is large, it will be easy to detect in the sample. If the size of the association is small it will be difficult to detect in the sample and a large sample size would be required. This parameter is the smallest measured difference between comparison groups, that the researcher would like the study to demonstrate. Selecting an appropriate effect size is the most difficult aspect of sample size planning. [3] The investigator should first try to find data from prior studies in the related area to make an informed guess about a reasonable effect size. Alternatively, one can choose the smallest effect size in one's opinion that would be clinically meaningful. When data are not available, it may be necessary to do a pilot study. The discussion of tradeoffs between sample size and effect size requires both the technical skills of the statistician and the scientific knowledge of the researcher. α, β and power When the researchers have executed the study and completed it, they use statistical tests to try to reject the null hypothesis in favor of its alternative, much in the same way that a prosecuting attorney tries to convince a jury to reject innocence in favor of guilt. Depending on whether the null hypothesis is true or false in the target population, and assuming that the study is free of bias, four situations are possible. The jury will award the punishment if the guilt is provedThe jury will free the innocent if not proved guilty.The innocent may be proved guilty (type Ι error).The guilty may be proved innocent (type ΙΙ error). Similarly in research, in two of these four situations, the findings in the sample and realities in the population are concordant, and the researcher's inference will be correct. In the other two situations, either a type Ι or type ΙΙ error has been made, or the inference will be incorrect. Thus, after stating the hypothesis and during planning for the study, the researcher should establish the following: the probability of making type Ι error (i.e. rejecting the null hypothesis when it is actually true also called as level of statistical significance or α) and a type ΙΙ error (β, which is failure to reject the null hypothesis) in advance of the study. [6] Type Ι error is very serious as compared to Type ΙΙ error. So the significance criterion (α) is normally set at 0.05 [7] and as the significance criterion becomes more precise and stringent, the sample size necessary to detect the minimum difference increases. Power The probability for committing a type ΙΙ error is given by the beta (β) value, whereas the probability of avoiding such an error is termed as the statistical power of the study. [8] Power is the quantity of 1β. It is the chance of observing an association of a given size or greater in a sample if one is actually present in the population. If β is set at 0.10, then the researcher has decided that he is willing to accept a 10% chance of missing an association of a given effect size. This represents a power of 0.90, that is, a 90% chance of finding an association of that size. As power is increased, the sample size increases. In clinical research, the statistical power is customarily set to 0.80. In any study design, once the research question is framed and the hypothesis specified and data collected on the variables of interest, a statistical test is then applied to test the hypothesis. This involves determining whether or not there is a significant difference between the means or proportions observed in the comparison groups. The ability of the statistical test to determine the differences between the study groups depend on several factors. These include the statistical power, the size of the difference earlier specified as clinically meaningful and the level of statistical significance. Statistical power is the probability that a statistical test will indicate a significant difference when it truly exists. That is, in a study comprising two groups of individuals, the power of a statistical study must be sufficient to enable detection of a statistically significant difference between the two study groups. Calculating the sample size Three main factors α, power and effect size must be considered in calculating the appropriate sample size . To calculate the sample size we must state the null and alternative hypothesis and select the appropriate statistical test, which in turn is dependent on the type of predictor and outcome variables used in the study. Choose a reasonable effect size, set α (normally, 0.05) and β (normally 0.20 i.e., a power of 0.80) and use the appropriate formula given in Statistical books, [9],[10] or use a software programs like EpiInfo [11] or nQuery. [12] There are also online web pages that can be used for sample size estimation. Additional considerations in calculating sample size for analytical studies include adjusting for potential dropouts which can be mathematically calculated, stating if it is a one sided or two sided hypothesis and determining the ratio of cases to controls. Sample size for descriptive studies Descriptive studies (including diagnostic tests) do not compare different groups and the concept of power and hypothesis are not applicable. In these types of studies, in the results, mean and proportions are presented. These studies commonly report confidence intervals, a range of consistent values about the sample mean or proportion. A confidence interval is a measure of the precision of a sample estimate. Thus when we are determining sample size for descriptive studies, we must specify the desired level and width of the confidence interval. The sample size can then be calculated from the formulas. Options to Minimize the Sample Size and Maximize the Power After arriving at a sample size, it should be a feasible number to eventually carry out the research in all respects. If the sample is large and not practical, certain methods can be adopted. Continuous variables can be used as they could allow smaller sample sizes than dichotomous variables.Paired measurements, one at the baseline and another at the conclusion of the study can be used for sample size calculation in experimental studies or cohort studies.Precise variables provide a smaller sample size in both analytic and descriptive studies, because they reduce variability. An outcome with a large variability requires more samples to measure a difference that truly exists.A more common outcome should be used in calculating the sample size as this increases the power of the study.Unequal groups can be studied as there are more benefits, if we study additional individuals in one group. It will be very easy to add the number of individuals in the control group than in the case group.Expanding the minimum expected difference will help to reduce the sample size. [13] Summary Determining the sample size is an important part of the design of both, analytic and descriptive studies. The sample size is an estimate of the number of subjects required to detect an association of a given effect size and variability, at a specified likelihood of making Type I (falsepositive) and Type II (false negative) errors. The maximum likelihood of making a Type 1 error is called α, and that of making a type 11 error, β. The quantity (1β) is power, the chance of observing an association of a given size or greater in a sample if one is actually present in the population. Those studies, which conclude without significant results could actually be an example of a study without adequate power. To achieve the desired aim in research studies concerned with establishing a difference between groups or in those conducted to estimate a quantity, appropriate sample size planning is mandatory. It is always appropriate to consult a statistician as microbiologic surveys, studies of medical tests, and surveys with differential sampling probabilities and other peculiar situations may require more complex techniques to arrive at an appropriate sample size. Acknowledgement I thank the management, Principal and staff of the Department of Oral and Maxillofacial Pathology of Ragas Dental College and Hospital,Chennai for their support. I extend my heartfelt gratitude to Prof. Greenspan J, Prof. Greenspan D, Prof. Shiboskhi CH, USCF, California, USA for their mentorship. References


