1276251155734027
Loading...

R Tutorials | Descriptive Statistics with R | #rstats

#rstats

What is Descriptive Statistics?

Descriptive statistics represent numbers that summarise certain characteristics about a given dataset (a population). They help us to get information based on a certain sample (subset of the population) in order to answer certain questions about various phenomena. In descriptive statistics we operate with the following measures in a data set:
  • measures of location (alternatively, measures of center, for they show us where the center is): the mean, the mode, the quartiles and the median
  • measures of variability which reflect the dispersion - the relative standing of a particular value within the data: the variance and the standard deviation as well as the range and the interquartile range
  • measures of shape which reflect the shape of the data: the skewness and the kurtosis.

R Tutorials | Why Use R for Descriptive Statistics?

R for Descriptive Statistics
Using R for Descriptive Statistics
R is a powerful tool to work with all sorts of statistics operations. It is free, quick and efficient and has a lot of libraries for data analysis and statistical computing. Also for the needs of descriptive statistics. You can work both from the R command line or use RStudio - an IDE for R. For our R tutorials, we shall use RStudio.

Descriptive Statistics with R

Scales of Measurement: Types of Data in Statistics

There are two main types of data: categorical (or qualitative) data and numerical (or quantitative) dataCategorical data (categorical variables) record qualities and characteristics of the individual observation. They have no numerical meaning, but categorical variables are numerically coded for convenience and can be summarised by means of the number/percentage (frequency) of observation in a sample. In R, we can declare a variable to be qualitative/categorical/nominal in the following way by means of the function factor():

nominal-variable-r



In contrast to the categorical data, numerical data (categorical data) represent actual numbers (e.g. time and age mesurements or counts). They can be measured on ordinal (ordering or quasi-ranking e.g. for sports competitions), interval (an equal-interval ordering with no zero point e.g. for temperature) and ratio scale (an equal-interval ordering with a zero point e.g. for price). Numerical variables (categorical variables) are subject to statistical math operations to find the location, variability and the shape of the data. To assign a variable as ordinal we need to use the function factor():

ordinal-variable-r


Contingency Tables and Frequency Tables in R

We use contingency tables to display the distribution of the variables. Contingency tables give a researcher a clear picture of interrelation between variables. They can be one- (the function table()) and two-dimensional (function xtabs()):

contingency tables in R






Frequency (distribution) tables are used to display how often the observations occur. In R we create frequency tables by using the following function table()/sum(table()):



Operations of Basic Descriptive Statistics in R

In R we can do basic statistics manipulations just by entering commands. For example, by entering the command summary() we can receive the so-called five-number-summary dividing the data into four equal sections - the minimum and the maximum values, the 1st and the 3rd quartiles as well as the median (the 2nd quartile) and the mean value. By entering mean(), we can calculate the mean value separately. For the measures of variability - the variance and the standard deviation - we have the following commands correspondingly: var() - displays the covariance matrix of all the variables from mydata file - and sd().

Sometimes in our R tutorials we need special libraries in order to use certain commands. For example, for the skewness and the kurtosis, we need the library argricolae. Having installed that library, we can use the functions skewness() and kurtosis().

Graphs and Visualisation  of Descriptive Statistics in R

In statistics we use different wys of visualisation. They are normally referred to as graphs. We can use boxplots, histograms, pie charts, scatterplots etc. to make our statistics more spectacular and easy to follow. In R we can produce different types of visualisation as well as save it in .png or .pdf formats. Let's go through the basic ways of visualisation in R to see how they can be useful for us.

Pie Charts - pie() - in R


A pie chart takes categorical data and shows the percentage of individuals that fall into each group of the categorical data. The sum of all the slices of the pie equals to 100% . In R, we create a pie chart by using the command pie():





Bar Charts - barplot() - in R


Bar charts is another means for summarising categorical or qualitative data. In the same way as a pie chart, a bar chart breaks data down by groups, showing how many observations lie in each group, or what percentage lies in each group using bars of different heights. We use the function barplot() to create a bar chart:






Histograms - hist() - in R

histograms in rHistograms have a simmilar appearance to that of bar charts. The difference is that they are used for quantitative variables. A histogram provides an overview of all the data broken down into numerically ordered groups. The height of each bar of the histogram represents either the number of individuals in each group (the frequency of each group) or the percentage of individuals in each group (the relative frequency of each group). In R we use the function hist() to create a histogram which we can modify in different ways.


histogram in r

boxplots in r


Boxplots - boxplot() - or box-and-whisker plots in R

Boxplots are used for quantitative variables. They show us the quartiles with the median in form of a bar in bold and the outliers in form of small cycles beyond and above the whiskers (quartiles). To create a boxplot in R we use the function boxplot(). We can also change the appearance of boxplots in different ways.
boxplot in R


Scatterplots - plot() - in R

scatterplots in R
Scatterplots display values for two or three variables from a dataset. On scatterplots, the data are positioned as a collection of points. Each point carries the value of one variable determining its position on the horizontal axis and the value of the other variable responsible for its position on the vertical axis. The function plot() allows us to make a scatterplot in R. We can also add the text subscription to the scatterplot by entering the command text().

scatterplot in R


On the scatterplot, we can also draw lines - lines() - and isolated points - points() - in R:

lines and isolated points in R



Now our scatterplot looks like this:

lines and isolated points in R


Matrix of all pairs of variables - pairs()

Finally, we can draw a matrix of all possible pairs of variables of a dataset. Such a graph shows the level of interdependence between different pairs of variables. We can do that using the function pairs().


how do we make a pair matrix in r





Precision of the Estimation - the Confidence Intervals for the Mean of a Variable 

We can draw confidence intervals for the mean of a variable in different groups by using the function plotmeans().  That function needs the library gplots. 

technology 7089435124525473036

Post a Comment

Home item

More links from #glfintech:

#glfintech Newsletter

#glfintech Recommends

#glfintech on Twitter

Statistics


Partner and Clients

Warmboutique

Webstore selling textile goods

Adblock is enabled

Hi! We have detected that you are using adblock on your web browser and take this chance to ask you to pause it just for this site. Time is money and we invest lots of time in the content that we work really hard on, and advertising is the only source of income for this particular project.
Thanks.

No harming software. We promise!