The a priori procedure (APP) for estimating median under skew normal settings with applications in economics and finance

Liqun Hu (New Mexico State University, Las Cruces, New Mexico, USA)

Tonghui Wang (Department of Mathematical Sciences, New Mexico State University, Las Cruces, New Mexico, USA)

David Trafimow (Department of Psychology, NMSU, Las Cruces, New Mexico, USA)

S.T. Boris Choy (Discipline of Business Analytics, The University of Sydney, Sydney, Australia)

Xiangfei Chen (Department of Mathematical Sciences, New Mexico State University, Las Cruces, New Mexico, USA)

Cong Wang (Mathematical and Statistical Sciences, University of Nebraska Omaha, Omaha, Nebraska, USA)

Tingting Tong (Department of Mathematical Sciences, New Mexico State University, Las Cruces, New Mexico, USA)

Asian Journal of Economics and Banking

ISSN: 2615-9821

Article publication date: 5 December 2023

Downloads

357

pdf (509 KB)

Abstract

Purpose

The authors’ conclusions are based on mathematical derivations that are supported by computer simulations and three worked examples in applications of economics and finance. Finally, the authors provide a link to a computer program so that researchers can perform the analyses easily.

Design/methodology/approach

Based on a parameter estimation goal, the present work is concerned with determining the minimum sample size researchers should collect so their sample medians can be trusted as good estimates of corresponding population medians. The authors derive two solutions, using a normal approximation and an exact method.

Findings

The exact method provides more accurate answers than the normal approximation method. The authors show that the minimum sample size necessary for estimating the median using the exact method is substantially smaller than that using the normal approximation method. Therefore, researchers can use the exact method to enjoy a sample size savings.

Originality/value

In this paper, the a priori procedure is extended for estimating the population median under the skew normal settings. The mathematical derivation and with computer simulations of the exact method by using sample median to estimate the population median is new and a link to a free and user-friendly computer program is provided so researchers can make their own calculations.

Keywords

Citation

Hu, L., Wang, T., Trafimow, D., Choy, S.T.B., Chen, X., Wang, C. and Tong, T. (2023), "The a priori procedure (APP) for estimating median under skew normal settings with applications in economics and finance", Asian Journal of Economics and Banking, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/AJEB-09-2023-0087

Publisher

:

Emerald Publishing Limited

License

Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

The a priori procedure (APP) is designed as a predata procedure where the goal is to estimate the sample size needed for sample statistics to be good estimates of corresponding population parameters. The researcher specifies how close she wants the sample statistic of concern to be to the corresponding population parameter, which is the precision issue. And she specifies the desired probability that the sample statistic will be within the range to which the precision specification refers, which is the confidence issue. For example, suppose the population distribution is normal, and the researcher wishes to have a 0.95 probability of obtaining a sample mean within 0.10 standard deviations of the population mean. In that case, the necessary sample size to meet the confidence and precision specifications is 385 (see Trafimow, 2017; Trafimow et al., 2019).

Of course, researchers might not wish to assume normal distributions; for example, they could assume skew normal distributions. However, APPs exist there, too, with respect to locations (Trafimow et al., 2019), scales (Wang et al., 2019b) and shapes (Wang et al., 2019a). There have been additional advances too (e.g. Cao et al., 2021, 2022; Chen et al., 2021; Tong et al., 2022; Trafimow et al., 2020; Wang et al., 2021, 2022; Wei et al., 2020; Wilson et al., 2022). However, as the APP literature continues to expand, there is a simple issue that somehow has escaped investigation. Specifically, the APP literature has bypassed the humble median as a topic. And that is the present issue. Given researcher-provided specifications for precision and confidence, what sample size does the researcher need to collect so that the sample median estimates the population median within the limits of the specifications? For example, what sample size does a researcher need to collect to have a 0.95 probability of obtaining a sample median that is within 0.10 standard deviations of the population median? The subsequent section and an appendix provide the mathematical derivation. This is followed by computer simulations and two worked examples related to economics and finance, that support the derivation. We also provide a link to a free and user-friendly computer program so researchers can make their own calculations.

2. Sampling distribution of the median under skew normal settings

Suppose that X is a continuous random variable with probability density function (PDF), f(x). The median, denoted by μ̃, is a value of X satisfying

P(X≤μ̃)=∫−∞μ̃f(x)dx=12=∫μ̃∞f(x)dx=P(X≥μ̃).

Let X₁, X₂, …, X_n be a random sample from a population with PDF f(x) and Y_j be the jth order statistic of the sample, j = 1, 2, …, n, i.e.

min{X1,…Xn}=Y1≤Y2≤⋯≤Yn=max{X1,X2,…,Xn}.

The sample median X̃ is defined by

X̃=Yn+12if n is odd,12Yn2+Yn2+1if n is even.

Consider a random sample of size n = 2m − 1 taken from the standard skew normal distribution SN(α) and let Z̃ be the sample median. According to order statistics, the PDF of Z̃ is given by

fZ̃(z)=n!(m−1)!(n−m)!fZ(z)[FZ(z)]m−1[1−FZ(z)]n−m=(2m−1)![(m−1)!]2fZ(z)FZ(z)[1−FZ(z)]m−1,z∈R,

where f_Z(z) and F_Z(z) are the PDF and cumulative distribution function (CDF) of Z ∼ SN(α).

For a normal population, the sample mean X¯ is the most efficient estimator of the population mean μ with the smallest variance. The efficiency of the sample median X̃, measured by the ratio of the variance of X¯ to the variance of X̃, is 4mπ(2m−1) where m=12(n+1). For large n, the efficiency is approximately equal to 2π.

In literature, the sampling distribution of the sample median from any continuous population with PDF f(x) is asymptotically normal with mean μ=μ̃ and variance σ2=1/(4nf(μ̃)2). See Chu, 1955. For small sample size, Rider (1960) discussed how well or how badly the variance of the asymptotically normal represents the true variance.

In this paper, we propose an APP for estimating the population median μ̃ based on a random sample from a skew normal distribution. Skew normal distributions differ from normal distributions in three ways. The mean is replaced by the location, the standard deviation is replaced by the scale and there is the addition of a third parameter, the shape parameter. A random variable Z∈R is said to follow a standard skew normal distribution if its PDF is given by (see Azzalini, 1985):

(1)fZ(z)=2ϕ(z)Φ(αz),z∈R,

where α is the shape parameter that controls the skewness of the distribution, ϕ(⋅) and Φ(⋅) are the PDF and the CDF of the standard normal distribution, respectively. For simplicity, we write Z ∼ SN(α). The skew normal distribution is positively (negatively) skewed if α > 0 (α < 0).

Note that the density curves of Z are positively skewed if α > 0 and negatively skewed if α < 0. This new class of distribution shares similar properties with the normal distribution. A location-scale extension to the non-standard skew normal distribution is given below.

Definition 1.

Let Z ∼ SN(α). The random variable X = ξ + ωZ follows a skew normal distribution with location ξ∈R and scale ω² > 0 and its PDF is given by

(2)fX(x)=2ωϕx−ξωΦαx−ξω,

denoted by X ∼ SN(ξ, ω², α).

Two remarks are addressed below.

Remark 1.

The main purpose of this paper is to use the APP to obtain the minimum sample size required for estimating the population median, μ̃, using the sample median, X̃. Therefore, without loss of generality, we assume that the sample size n = 2m − 1 is an odd number with a positive integer m.

Remark 2.

Let μ̃z and μ̃ be the population medians of Z ∼ SN(α) and X ∼ SN(ξ, ω², α), respectively. The linear relationship between X and Z gives μ̃=ξ+ωμ̃z. Similarly, if X₁, X₂, …, X_n form a random sample from the SN(ξ, ω², α) distribution and Y₁, Y₂, …, Y_n be the corresponding order statistics. Then X̃=Ym=ξ+ωZ̃, where Z̃ is the sample median of the random sample, Z₁, Z₂, …Z_n, from the SN(α) distribution, where Z_i = (X_i − ξ)/ω, i = 1, 2, …, n. Thus it suffices to find the minimum sample size for estimating μ̃z using the sample median Z̃.

The density curves of Z̃ for various values of n and α are displayed in Figures 1 and 2, respectively. Figure 1 shows that, for n = 101 and a fixed α value, the distribution of Z̃ is symmetric. Moreover, the location of the distribution increases and the scale of the distribution decreases as α increases. For α = 1, Figure 2 shows that the distribution of Z̃ is symmetric with the same value for the location for different sample sizes. The scale of the distribution decreases as the sample size increases. We can see how the skewness parameter α affects the density curves in Figure 1, and how the sample size n affects the density curves in Figure 2.

Note that the relationship between the median μ̃z and the skewness parameter α of the standard skew normal distribution is

∫−∞μ̃zfZ(z;α)dz=12=∫μ̃z∞fZ(z;α)dz.

As shown in Figure 3 for n = 101, μ̃z increases with α and μ̃z will converge to the median of the standard half normal distribution as α → ∞.

3. The APP for estimating the population median

In this section, we will using two methods (exact and normal approximation) to find the minimum sample sizes for estimating the population median.

3.1 The APP using an exact method

Theorem 1.

Let Z̃ be the sample median of a random sample of size n = 2m − 1 from the standard skew normal population with skewness parameter α and μ̃z be the population median. Let c be the confidence level and f be the precision, which is specified such that

P|Z̃−μ̃z|≤fσ=c,

where σ is the standard deviation of the SN(α) distribution. Since the PDF of Z̃ is asymmetric, the above equation can be rewritten as

(3)Pf1≤Z̃−μ̃zσ≤f2=c,

where f₁ and f₂ are selected such that max{|f₁|, f₂} ≤ f. Under APP, the required sample size n can be obtained from

(4)∫f1f2fW(w)dw=c

subject to the length of the confidence interval ℓ = f₂ − f₁ being the minimum, where f_W(w) is the PDF of W=(Z̃−μ̃z)/σ and is given by

fW(w)=(2m−1)![(m−1)!]2fZ(u)FZ(u)[1−FZ(u)]m−1,u=μ̃z+σz.

Proof: Note that the variance of Z ∼ SN(α) is σ2=1−2δ2π, where δ=α1+α2. For a given confidence level c and precision f, we set up Equation (4) and find the sample size n required such that the length of the confidence interval ℓ = ( f₂ − f₁) is minimized. Let W=(Z̃−μz̃)/σ. It can be shown that the length of the confidence interval ℓ is minimized when f_W(f₂) = f_W(f₁) so that the ratio f_W (f₂)/f_W(f₁) = 1 (The proof is given in Appendix: A). Thus the required sample size n, together with f₁ and f₂ such that max{|f₁|, f₂} ≤ f can be determined simultaneously. □

Remark 3.

To illustrate the results in Theorem 1, the required minimum sample size n with min{|f₁|, f₂} ≤ f = 0.2 and the ratio f_W (f₂)/f_W (f₁) are listed in Table 1 for c = 0.95 and values of α = 1, 2, 5, 10. From this table, we can see that the required sample sizes n obtained satisfy our assumptions in Theorem 1 numerically. Also, the density curves of W for n = 51, 101, 201 and α = 1 are given in Figure 4. These density curves are slightly skewed to right since α = 1.

Figure 4 shows that the distribution of W for α = 1 and n = 51, 101 and 201, respectively. From the graph, one can see that the distribution is slightly skewed to right (or symmetric) with the same location while the scale of the distribution decreases with increasing sample size n.

3.2 The APP using the normal approximation

In the last subsection, we provide an exact method to obtain the minimum sample size for the estimation of the population median under the skew normal setting. In this subsection, we propose a normal approximation method to simplify the mathematical calculations. We will set up the APP procedure for estimating population median μ̃z with sample median Z̃.

Theorem 2.

Suppose that Z₁, Z₂, …, Z_n form a random sample of size n = 2m − 1 from the standard skew normal distribution SN(α). Let c be a confidence level and f be the precision such that the error associated with the median estimator, Z̃, is fσ₁ with conference c, i.e.

(5)P−fσ1≤Z̃−μ̃z≤fσ1=c,

where σ12=8fZ(μ̃z)−1 with n = 2 (n = 2 is the minimum sample size needed for the existence of median) and f_Z(z) is given in Equation (1). Then the required minimum sample size n is given by

(6)n=2z(1−c)/2f2−1,

where z_(1−c)/2 is the value of the standard normal random variable Z₀ such that P(Z₀ > z_(1−c)/2) = (1 − c)/2.

Then the required sample size n = 2m − 1 (m is the middle number of sample size n, which is given in Equation (6)).

Proof: Since Z̃ can be approximated by the normal distribution with mean μ̃z and variance σ2=4nfZ(μ̃z)2−1, the confidence interval of μ̃z with confidence c and precision f is given by

PZ̃−z(1−c)/2⋅σ≤μ̃z≤Z̃+z(1−c)/2⋅σ=c.

Note that σ1=σn2. Thus the above confidence interval is equivalent to

(7)P−z(1−c)/2σ12n≤Z̃−μ̃z≤z(1−c)/2σ12n=c.

From Equations (5) and (7), we obtain f=z(1−c)/22n, which is equivalent to Equation (6) and the desired result follows. □

Remark 4.

Note that this normal approximation method applies to any continuous distribution with PDF f(x) and the minimum sample size n required is free from the skewness parameter α of the SN(α) distribution.

The required sample size n needed for the given confidence levels c = 0.90 and c = 0.95, and various values of precision f for α are shown in Table 2. Note that the minimum sample size n needed with confidence level c = 0.95 and precision f = 0.1 is 769 = 2(385) − 1 and 385 is the sample size required for estimating the population mean using the sample mean with the same c and f. See Trafimow, 2017; Trafimow et al. (2019) for details.

4. Simulation study

In this section, we perform a simulation study to evaluate the performance of the proposed APP. To compute the required minimum sample size n for estimating the population median μ̃z of a standard skew normal distribution using the exact method, we provide an online calculator at: https://apprealization.shinyapps.io/nformedian/.

The minimum sample sizes n for various values of f and α are given in Table 3 for c = 0.95 and Table 4 for c = 0.90, respectively. Our programs were created using R software, and they are available upon request. For a fixed value of precision f, skewness α and confidence level c, 10,000 random samples with the corresponding minimum sample sizes (obtained in Tables 1 and 2) are based on the standard skew normal distribution. Then 10,000 confidence intervals for μ̃z are constructed and coverage rates(cr) are obtained, which are given in Tables 3 and 4, respectively. For c = 0.95 (0.90), all coverage rates are near 0.95 (0.90).

Tables 3 and 4 show that at confidence level c = 0.95, the minimum sample sizes n needed for estimating the population median μ̃z using the sample median Z̃ for given α and precision f. The tables also show the coverage rate(cr) with M = 10000 (Simulate 10,000 random samples from standard skew normal population with required sample size).

For α = 0, the change in the minimum sample size n given the precision f using the exact and normal approximations methods are displayed in Figure 5. Figure 5 shows that n decreases as f increases for both c = 0.95 and c = 0.90 and the discrepancy in n under the two methods also reduces. Figure 5 indicates the comparison of the sample sizes n needed for using normal approximation and the sample sizes n needed for using exact method (for α = 0).

Here, with the same minimum sample size n presented in Tables 3 and 4 for c = 0.90, we compare the coverage rate of estimating the parameter μ̃z using the exact method with using the normal approximation and the results are shown in Table 5. Similarly, with the same minimum sample size n found in Table 1 for c = 0.95, we can also compare the coverage rate for estimating the parameter μ̃z using the exact method with the coverage rate of estimating μ̃z using the normal approximation. These comparisons are shown in Table 6.

In Tables 5 and 6, the coverage rates of using exact method are larger than those of using the normal approximation, which indicates that using the exact method is more efficient than using the normal approximation method.

5. Real data analyses

In this section, we analyze two real data sets to assess the performance of our APP methods for estimating the median under skew normal settings.

Example 5.1.

This data set contains the salaries for San Francisco city employees in 2014 (with unit of 10,000 dollars). The total number of observations is 29773, and the median salary is 8.4680. This data set can be downloaded from: https://www.kaggle.com/datasets/kaggle/sf-salaries?resource=download

Using the method of moment estimation, the fitted distribution is SN(3.8718, 6.9955², 7.3712). The histogram and its fitted kernel and skew normal density curves of salaries are given, respectively, in Figure 6.

Now, if we choose f = 0.10 and c = 0.95, the required minimum sample size is n = 635 (exact method). We then draw a random sample of size n = 635. The sample median is 8.4945 and the 95% confidence interval for using exact method is (8.0714, 8.9176). The estimated median of the fitted model is 8.5902. The 95% confidence interval constructed using the normal approximation is (8.0662, 8.9228). Note that the population median 8.4680 falls into both confidence intervals. The length of the 95% confidence interval using the exact method is 0.8462, which is shorter than that (0.8566) using the normal approximation. This result confirms that the exact method is more efficient than the normal approximation method in the estimation of population median.

Example 5.2.

This data set contains the market's opening price for wheat future in 2000–2023 (Futures are financial contracts obligating the buyer to purchase and the seller to sell a specified amount of a particular grain at a predetermined price on a future date). The total number of observations is 5,778, and the median is 5.0225 (The unit is dollars/bushel). The data set can be downloaded from: https://www.kaggle.com/datasets/guillemservera/grains-and-cereals-futures.

Using the method of moment estimation, the fitted skew normal distribution is SN(3.1606, 3.2044², 3.9325). The histogram and its fitted kernel and skew normal density curves of market's opening price are given, respectively, in Figure 7.

Now, if we use f = 0.15 and c = 0.95, the required minimum sample size is n = 265 (exact method). We then draw a random sample of size n = 265. The sample median is 5.2150, and the 95% confidence interval for using exact method is (4.9255, 5.5045). The estimated median of the fitted model is 5.3210. The 95% confidence interval constructed using the normal approximation is (4.9102, 5.5197). Note that the population median 5.0225 falls into both confidence intervals. The length of the 95% confidence interval using the exact method is 0.5790, which is shorter than that (0.6095) using the normal approximation. This result confirms that the exact method is more efficient than the normal approximation method in the estimation of population median.

Example 5.3.

Consider the data set on power consumption in Mega Units (MU) in Kerala, India, from January 2019 till May 2020, given in https://www.kaggle.com/datasets/twinkle0705/state-wise-power-consumption-in-india/.

The total number of observations is 502, and the median is 71.45.

Using the method of moment estimation, the fitted skew normal distribution is SN(65.9654, 9.2893², 1.5483). The histogram of the data set and its fitted kernel and skew normal density curves of power consumption are given, respectively, in Figure 8.

Now, if we use f = 0.20 and c = 0.95, the required minimum sample size is n = 147 (exact method). We then draw a random sample of size n = 147 from the whole data set and the sample median is 71.3000. The constructed 95% confidence interval for population median using exact method is (69.9333, 72.6667). The estimated median of the fitted model is 71.8059. The 95% confidence interval constructed using the normal approximation is (69.9212, 72.6788). Note that the population median 71.45 falls into both confidence intervals. The length of the 95% confidence interval using the exact method is 2.7333, which is shorter than that (2.7576) using the normal approximation. This result confirms that the exact method is more efficient than the normal approximation method in the estimation of population median.

6. Conclusions, limitations and future research

Although the exact method for obtaining the required minimum sample size provides better estimates than the normal approximation method, the latter is easier for researchers to use. However, the link to the program we provided adequately addresses the difficulty of using the exact method.

Two findings are noteworthy. The exact method results in smaller minimum sample sizes necessary for using sample medians to estimate corresponding population medians. Thus, the exact method provides sample size savings relative to the normal approximation method. Second, unlike the population mean, the population median is not a parameter of the skew normal distribution. Consequently, the required minimum sample size to meet specifications for precision and confidence should be larger for the median than the mean and our findings confirm this. However, an unexpected and unprecedented finding pertains to the effect of skewness on the determination of the minimum sample size necessary to meet specifications for precision and confidence. For the location, it has been well-documented that as the shape parameter increases, the required minimum sample size necessary to meet specifications decreases. For example, in Trafimow et al. (2019), suppose the specifications for precision and confidence are 0.10 and 0.95 for a random sample taken from the standard skew normal population. For location estimation, the minimum sample sizes needed when the shape parameter is 0 (normality), 0.5, 1, 2 and 5 are 385, 158, 146, 140 and 138, respectively. Note the monotonically decreasing sample size trend; if we remain with locations, the larger the shape parameter, the smaller the minimum sample size required to meet specifications. In contrast, for median estimation, the analogous sample sizes are 605, 603, 597, 577 and 613. Thus, the median differs dramatically from the mean in that median estimation does not follow decreasing monotonicity.

Researchers have often been advised to use the median, as opposed to the mean, when there is significant skewness. However, the present work qualifies the recommendation. To have impressive precision and confidence for using the sample median to estimate the population median, necessitates larger than typical sample sizes. Nor does increasing skewness provide sample size savings as when using the sample location to estimate the corresponding population location. This is not to say that researchers should never use the median, as there are times when the median is a useful statistic necessary for either theoretical or applied purposes. For skew normal population distributions, researchers should consider using the location instead of the median, if this is consistent with the researcher's theoretical or applied goals. If only the median can fit researcher goals, then the researcher should be prepared to either collect a large sample size or suffer a precision or confidence penalty.

For our future research, we will attempt to extend our APP for estimating parameters from skew normal distributions to other distributions, such as skew inverse Gaussian, log-skew normal, generalized gamma distributions, etc.

Figures

Figure 1

Density curves of Z̃ with α = 0, 0.5, 1, 5, and n = 101

Figure 2

Density curves of Z̃ with n = 51, 101, 201, and α = 5

Figure 3

The relationship between μ̃z and α in SN(α) for n = 101

Figure 4

Density curves of W with n = 51, 101, 201 and α = 5

Figure 5

Sample size n ranges along the vertical axis as a function of precision f along the horizontal axis

The histogram, kernel density (blue) and skew normal density curve of SN(3.8718, 6.99552, 7.3712) (red) of salaries for San Francisco city employees in 2014

Figure 6

The histogram, kernel density (blue) and skew normal density curve of SN(3.8718, 6.9955², 7.3712) (red) of salaries for San Francisco city employees in 2014

The histogram, kernel density (blue) and skew normal density curve of SN(3.1606, 3.20442, 3.9325) (red) of market's opening price for wheat future in 2000–2023

Figure 7

The histogram, kernel density (blue) and skew normal density curve of SN(3.1606, 3.2044², 3.9325) (red) of market's opening price for wheat future in 2000–2023

The histogram, kernel density (blue) and skew normal density curve of SN(65.9654, 9.28932, 1.5483) (red) of power consumption in Kerala, India, from January 2019 till May 2020

Figure 8

The histogram, kernel density (blue) and skew normal density curve of SN(65.9654, 9.2893², 1.5483) (red) of power consumption in Kerala, India, from January 2019 till May 2020

Table 1

The ratio f_W( f₂)/f_W( f₁), f₁, f₂, and the required minimum sample size n for c = 0.95 and f = 0.2, and the values of α = 1, 2, 5, 10

α	n	f₁	f₂	f_W( f₂)/f_W( f₁)
0	151	−0.2000	0.2000	1.0000
1	149	−0.1990	0.2000	0.9985
2	145	−0.1963	0.2000	0.9995
5	153	−0.1926	0.2000	1.0005
10	161	−0.1926	0.2000	0.9991