Or you can type colors in R Studio console to get the list of colours available in R. Box Plot when Variables are Categorical. Often times, you have categorical columns in your data set. Ggplot2 generates aesthetically appealing box plots for categorical variables too. And it is the same way you defined a box plot for a quantitative variable. An alluvial chart is a variation of the sankey plot. It visualizes frequency distributions over time or frequency tables involving several categorical variables. In R, 2 packages exist to build an alluvial diagram: alluvial and ggalluvial.
Contents
We look at some more options for plotting, and we assume that you arefamiliar with the basic plotting commands (Basic Plots). Avariety of different subjects ranging from plotting options to theformatting of plots is given.
In many of the examples below we use some of R’s commands to generaterandom numbers according to various distributions. The section isdivided into three sections. The focus of the first section is ongraphing continuous data. The focus of the second section is ongraphing discrete data. The third section offers some miscellaneousoptions that are useful in a variety of contexts.
Contents
In the examples below a data set is defined using R’s normallydistributed random number generator.
One common task is to plot multiple data sets on the same plot. Inmany situations the way to do this is to create the initial plot andthen add additional information to the plot. For example, to plotbivariate data the plot command is used to initialize and create theplot. The points command can then be used to add additional datasets to the plot.
First define a set of normally distributed random numbers and thenplot them. (This same data set is used throughout the examples below.)
Note that in the previous example, the colour for the second set ofdata points is set using the col option. You can try differentnumbers to see what colours are available. For most installationsthere are at least eight options from 1 to 8. Also note that in theexample above the points are plotted as circles. The symbol that isused can be changed using the pch option.
Again, try different numbers to see the various options. Anotherhelpful option is to add a legend. This can be done with the legendcommand. The options for the command, in order, are the x and ycoordinates on the plot to place the legend followed by a list oflabels to use. There are a large number of other options so usehelp(legend) to see more options. For example a list of colors canbe given with the col option, and a list of symbols can be givenwith the pch option.
Figure 1.
Another common task is to change the limits of the axes to change thesize of the plotting area. This is achieved using the xlim andylim options in the plot command. Both options take a vector oflength two that have the minimum and maximum values.
Another common task is to add error bars to a set of data points. Thiscan be accomplished using the arrows command. The arrows commandtakes two pairs of coordinates, that is two pairs of x and yvalues. The command then draws a line between each pair and adds an“arrow head” with a given length and angle.
Figure 2.
Note that the option code is used to specify where the bars aredrawn. Its value can be 1, 2, or 3. If code is 1 the bars are drawnat pairs given in the first argument. If code is 2 the bars aredrawn at the pairs given in the second argument. If code is 3 thebars are drawn at both.
In the previous example a little bit of “noise” was added to the pairsto produce an artificial offset. This is a common thing to do formaking plots. A simpler way to accomplish this is to use the jittercommand.
Figure 3.
Note that a new command was used in the previous example. The parcommand can be used to set different parameters. In the example abovethe mfrow was set. The plots are arranged in an array where thedefault number of rows and columns is one. The mfrow parameter is avector with two entries. The first entry is the number of rows ofimages. The second entry is the number of columns. In the exampleabove the plots were arranged in one row with two plots across.
Figure 4.
There are times when you do not want to plot specific points but wishto plot a density. This can be done using the smoothScatter command.
Figure 5.
Note that the previous example may benefit by superimposing a grid tohelp delimit the points of interest. This can be done using the gridcommand.
There are times that you want to explore a large number ofrelationships. A number of relationships can be plotted at one timeusing the pairs command. The idea is that you give it a matrix or adata frame, and the command will create a scatter plot of allcombinations of the data.
Figure 5.
A shaded region can be plotted using the polygon command. Thepolygon command takes a pair of vectors, x and y, and shades theregion enclosed by the coordinate pairs. In the example below a bluesquare is drawn. The vertices are defined starting from the lowerleft. Five pairs of points are given because the starting point andthe ending point is the same.
A more complicated example is given below. In this example therejection region for a right sided hypothesis test is plotted, and itis shaded in red. A set of custom axes is constructed, and symbols areplotted using the expression command.
Figure 6.
The axes are drawn separately. This is done by first suppressing theplotting of the axes in the plot command, and the horizontal axis isdrawn separately. Also note that the expression command is used toplot a Greek character and also produce subscripts.
Finally, a brief example of how to plot a surface is given. Thepersp command will plot a surface with a specified perspective. Inthe example, a grid is defined by multiplying a row and column vectorto give the x and then the y values for a grid. Once that is donea sine function is specified on the grid, and the persp command isused to plot it.
The %*% notation is used to perform matrix multiplication.
In the examples below a data set is defined using R’s hypergeometricrandom number generator.
The plot command will try to produce the appropriate plots based onthe data type. The data that is defined above, though, is numericdata. You need to convert the data to factors to make sure that theplot command treats it in an appropriate way. The as.factor commandis used to cast the data as factors and ensures that R treats it asdiscrete data.
In this case R will produce a barplot. The barplot command can alsobe used to create a barplot. The barplot command requires a vector ofheights, though, and you cannot simply give it the raw data. Thefrequencies for the barplot command can be easily calculated usingthe table command.
In the previous example the barplot command is used to set the titlefor the plot and the labels for the axes. The labels on the ticks forthe horizontal axis are automatically generated using the labels onthe table. You can change the labels by setting the row names of thetable.
The order of the frequencies is the same as the order in the table. Ifyou change the order in the table it will change the way it appears inthe barplot. For example, if you wish to arrange the frequencies indescending order you can use the sort command with the decreasingoption set to TRUE.
The indexing features of R can be used to change the order of thefrequencies manually.
The barplot command returns the horizontal locations of thebars. Using the locations and putting together the previous ideas aPareto Chart can be constructed.
Mosaic plots are used to display proportions for tables that aredivided into two or more conditional distributions. Here we focus ontwo way tables to keep things simpler. It is assumed that you arefamiliar with using tables in R (see the section on two way tables formore information: Two Way Tables).
Here we will use a made up data set primarily to make it easier tofigure out what R is doing. The fictitious data set is definedbelow. The idea is that sixteen children of age eight areinterviewed. They are asked two questions. The first question is, “doyou believe in Santa Claus.” If they say that they do then the term“belief” is recorded, otherwise the term “no belief” is recorded. Thesecond question is whether or not they have an older brother, oldersister, or no older sibling. (We are keeping it simple here!) Theanswers that are recorded are “older brother,” “older sister,” or “noolder sibling.”
The data is given as strings, so R will automatically treat them ascategorical data, and the data types are factors. If you plot theindividual data sets, the plot command will default to producingbarplots.
If you provide both data sets it will automatically produce a mosaicplot which demonstrates the relative frequencies in terms of theresulting areas.
The mosaicplot command can be called directly
The colours of the plot can be specified by setting the colargument. The argument is a vector of colours used for the rows. SeeFgure :ref`figure7_intermediatePlotting` for an example.
Figure 7.
The labels and the order that they appear in the plot can be changedin exactly the same way as given in the examples for barplot above.
When changing the order keep in mind that the table is a twodimensional array. The indices must include both rows and columns, andthe transpose command (t) can be used to switch how it is plottedwith respect to the vertical and horizontal axes.
Contents
The previous examples only provide a slight hint at what ispossible. Here we give some examples that provide a demonstration ofthe way the different commands can be combined and the options thatallow them to be used together.
First, an example of a histogram with an approximation of the densityfunction is given. In addition to the density function a horizontalboxplot is added to the plot with a rug representation of the data onthe horizontal axis. The horizontal bounds on the histogram will bespecified. The boxplot must be added to the histogram, and it willbe raised above the histogram.
The dev commands allow you to create and manipulate multiplegraphics windows. You can create new windows using the dev.new()command, and you can choose which one to make active using thedev.set() command. The dev.list(), dev.next(), and dev.prev()command can be used to list the graphical devices that are available.
In the following example three devices are created. They are listed,and different plots are created on the different devices.
There are a couple ways to print a plot to a file. It is important tobe able to work with graphics devices as shown in the previoussubsection (Multiple Windows). The first way explored is to usethe dev.print command. This command will print a copy of thecurrently active device, and the format is defined by the deviceargument.
In the example below, the current window is printed to a png filecalled “hist.png” that is 200 pixels wide.
To find out what devices are available on your system use the helpcommand.
Another way to print to a file is to create a device in the same wayas the graphical devices were created in the previous section. Oncethe device is created, the various plot commands are given, and thenthe device is turned off to write the results to a file.
Basic annotation can be performed in the regular plottingcommmands. For example, there are options to specify labels on axes aswell as titles. More options are available using the axis command.
Most of the primary plotting commands have an option to turn off thegeneration of the axes using the axes=FALSE option. The axes can bethen added using the axis command which allows for a greater numberof options.
In the example below a bivariate set of random numbers are generatedand plotted as a scatter plot. The axes are added, but the horizontalaxis is located in the center of the data rather than at the bottom ofthe figure. Note that the horizontal and vertical axes are addedseparately, and are specified using the first argument to thecommand. (Use help(axis) for a full list of options.)
In the previous example the at option is used to specify the tick marks.
When using the plot command the default behavior is to draw anaxis as well as draw a box around the plotting area. The drawing ofthe box can be suppressed using the bty option. The value can be“o,” “l,” “7,” “c,” “u”, “],” or “n.” (The lines drawn roughly looklike the letter given except for “n” which draws no lines.)The box can be drawn later using the box command as well.
The par command can be used to set the default values for variousparameters. A couple are given below. In the example below the defaultbackground is set to grey, no box will be drawn around the window, andthe margins for the axes will be twice the normal size.
Another common task is to place a text string on the plot. The textcommand takes a coordinate and a label, and it places the label at thegiven coordinate. The text command has options for setting theoffset, size, font, and other options. In the example below the label“numbers!” is placed on the plot. Use help(text) to see moreoptions.
The default text command will cut off any characters outside of theplot area. This behavior can be overridden using the xpd option.