R Tutorials | Probability Distributions in R Statistics | #rstats

#rstats In statistics, the data can be distributed in different ways based on the probability of the events. For statistical research, one needs to know how to use probability distributions in R. In this series of our R tutorials we are going to have a look at some of the most frequent probability distributions in R.

Probability Distributions in R

What are Probability Distributions in Statistics?

Probability theory has produced a long list of probability distributions. In R, one can easily use most of them. The question which normally arises is how to choose the suitable one from the large quantity of probability distributions existing in statistics. The most important criteria for the choice of probability distributions are the following:

the quality of the variables - continuous or descrete
symmetry of the data
the presence of upper or lower data limits
likelihood of observing extreme values.

The aim of this series of R tutorials is not to describe how to choose the right distribution for the research. That question is quite a demanding one and would require one or more posts to cover the basic issues. Here what we are going to concentrate on is show how to work with the most frequent probability distributions in R. We will, therefore, cover the binominal and normal probability distributions in R and their graphical representation.

Probability Distributions in R

Binominal Distribution in R

One of the most basic probability distributions for descrete variables in statistics is the binominal one which presupposes the existence of two outcomes in one trial each having each 50% of probability. The more trials there are, the higher the probability for the mean. Let’s draw the binominal distribution in R using the density function dbinom(x, size, probablity):

x <- 0:30

plot(x, dbinom(x, 30, 0.5), type = "h")

Binominal Distribution in R

We can also draw the cumulative distribution function by using the function pbinom(x, size, probablity):

x <- 0:30

plot(x, pbinom(x, 30, 0.5), type = "h")

binominal_cumulative_distribution_function_in_R

Cumulative Distribution Function for Binominal Distribution in R

The function qbinom() generates the quantile function and rbinom() creates random deviates.

Normal Distribution in R

The binominal distribution is approximated by the normal distribution which is used for continuous variables. In fact, most of the variables in the real world are either normally distributed or can be approximated as such. The reason is that many of the variables are independent and represented by factors which have two outcomes.

In R, the normal distribution is plotted easily in a similar way as the binominal distribution above by means of the density function dnorm(). As in case of the binominal distribution, we first plot the sequence for the normal distribution (from -4 to 4 with the interval of 0.01). Next, we can plot the normal distribution in two ways using the function plot() or curve():
x<-seq(-4, 4, 0.01)

#1)

plot(x, dnorm(x))

#2)

curve(dnorm(x), from= -4, to=4)

Normal Distribution in R

The cumulative distribution function is produced in a similar way to that of the binominal distribution:

plot(x, pnorm(x), type="l")

normal_cumulative_distribution_function_in_R

Cumulative Distribution Function for Normal Distribution in R

The function qnorm() generates the quantile function which can be used e.g. to ask what value for z is at the 25% percentile:
qnorm(0.25)
It can also be used to find the interval of values for z that includes 95% of the distribution as well as for calculating confidence intervals:
qnorm(c(0.025, 0.975))

The function rbinom() creates random deviates.

Example of Normal Distribution in Statistics:

Blood Pressure Levels

1. We plot the sequence for the normal distribution of the blood pressure among people from 60 to 120 mmHg with the interval of 1 on the x-axis – bt:

bt <- seq(60, 120, 1)

2. Next, we plot the normal distribution with the mean of 90 mmHg and the standard deviation of 10 mmHg:

plot(bt, dnorm(bt, 90, 10), type="l", xlim=c(60, 120), main="Blood Pressure")

3. The function pnorm() is used e.g. to calculate the proportion of people with the level of blood pressure equal to 80 mmHg or BELOW, taking into account that the population mean=90 and the standard deviation=10. There are two ways of putting that in R:

1) pnorm(mean=90, sd=10, 80)

2) pnorm(80,90,10)

4. We can round the answer 1) to the closest integer or 2) to the first two characters after comma:

1) round(pnorm(80,90,10))

2) round(pnorm(80,90,10),2)

Statistical Tests in Normal Ditribution

Let's now use the normal distribution of blood pressure among people for a one- and two-tailed statistical z-tests. We shall start by the one-tailed test.

1. The first action would be to count the probability of randomly selecting a subject bt 72 mmHg or lower, the so-called p-value which is equal to the surface of the polygon (for more info, enter ?polygon in R) that we are going to draw in this series of our R tutorials:

pnorm(72, 90, 10)

2. After that we draw a vertical line for 72 (v is for the x-value):

abline(v=72)

3. Next, we create the coordinates for the polygon whose surface reflects the probability of getting the subject bt of 72 mmHg or bellow:

cord.x <- c(60,seq(60,72,1),72)

cord.y <- c(0,dnorm(seq(60, 72, 1), 90, 10),0)

4. Finally, we draw the polygon in blue and add some text to it:

polygon(cord.x,cord.y,col='skyblue')

text(70, 0.005, "blue area = p = 0.0359")

One-tailed Test for Normal Distribution in R

5. We have now conducted the one-tailed test. Let’s now accomplish our work adding the two-tailed test to our graph. We add namely a mirror space of the first polygon to the other side of the bell curve to find how the mean value deviates from the searched value (72 mmHg):

cord.x1 <- c(108,seq(108,120,1),120)

cord.y1 <- c(0,dnorm(seq(108, 120, 1), 90, 10),0)

polygon(cord.x1,cord.y1,col='skyblue')

6. Finally, we add some text to the graph:

text(65, 0.005, round(pnorm(72, 90, 10), 3))

text(115, 0.005, round(pnorm(72, 90, 10), 3))

text(75, 0.02, " p = 0.072 " )

Two-tailed Test for Normal Distribution in R

Other Distributions in R

The normal distribution one of the most frequently used probability distributions in R. Besides the normal distribution, there are other distributions which are applied statistics:

Poisson - dpois()
Student - dt()
Chi-square - dchisq() etc.

The complete script for this series of R tutorials can be found on GitHub.

Global Fintech | Finance, Business Intelligence and Technologies

Menu

Business & Finance

Economics

Technology

Media

R Tutorials | Probability Distributions in R Statistics | #rstats

What are Probability Distributions in Statistics?