Key Points
- ggplot shapes represent the points in a scatter plot, and they can be used to distinguish different groups of data, highlight outliers, or add more information to the plot.
- We can use 25 different ggplot shapes, ranging from simple circles and squares to more complex shapes like stars and triangles. We can customize the shapes, colors, sizes, and fill to suit our needs.
- We can map a variable to the shape of the points using the aes() function, which will create a different point shape for each variable level and add a legend to the plot.
- We can use geom_jitter() and geom_count() to deal with overplotting, which occurs when there are too many points in the plot that overlap with each other. geom_jitter() adds a small amount of random noise to the x and y coordinates of the points, and geom_count() adds points with sizes proportional to the number of observations at each position.
- We can use geom_smooth() to add a regression line and a confidence interval to the plot, which can help to show the trend and the uncertainty of the relationship between the variables. Using the level argument, you can use different regression models, such as linear or loess, and control the confidence level.
Package | Function | Description |
---|---|---|
ggplot2 | ggplot() | Creates a ggplot object |
ggplot2 | aes() | Defines the mapping of data to aesthetics |
ggplot2 | geom_point() | Adds points to the plot |
ggplot2 | geom_jitter() | Adds jittered points to the plot |
ggplot2 | geom_count() | Adds points with sizes proportional to the number of observations |
ggplot2 | geom_smooth() | Adds a smoothed conditional mean |
ggplot2 | geom_text() | Adds text labels to the plot |
ggplot2 | geom_label() | Adds text labels with a background color to the plot |
ggplot2 | scale_shape_manual() | Sets the values of point shapes manually |
ggplot2 | scale_shape_discrete() | Sets the values of point shapes automatically |
Table of Contents
Hi, I’m Zubair Goraya, a Ph.D. scholar, a certified data analyst, and a freelancer with five years of experience. I have used R for data analysis and visualization for over five years. I have worked on various projects involving data from different domains and taught R courses and workshops to students and professionals.
Customizing Scatterplots with ggplot2 Shapes: A Visual Guide
When I first made the scatter plot, it looked too messy and hard to interpret. I wanted to make it clearer and more attractive, so I researched solutions online. I found out that I can use ggplot shapes in R to change the symbols of the points in the plot. These shapes can help me show different data groups, highlight interesting cases, or add more information to the plot. There are 25 different shapes that I can choose from, such as circles, squares, stars, and triangles. My fellow researchers are also facing the same problem.
I decided to write an article because I want to share my knowledge and experience as a data analyst with you. I want to help you learn how to use ggplot shapes in R to create stunning scatter plots showing the relationship between two variables and how to enhance them with different features and techniques.
ggplot shapes in R are symbols that represent the points in a scatter plot. A scatter plot is a type of plot that shows the relationship between two continuous variables. It displays the values of the variables as points in a two-dimensional space, where the position of each point is determined by its x and y coordinates. A scatter plot can reveal the direction, strength, and shape of the relationship between the variables, as well as the presence of outliers or clusters.
Scatter Plot in R
To make a basic scatter plot with ggplot2, you need to specify two things:
- The data frame that contains the variables you want to plot. We will be using mtcars data set.
- The mapping of the variables to the x and y axes
We can see a negative relationship between mpg and wt, meaning heavier cars have lower fuel efficiency. We can also see some variation in the relationship, as some cars have higher or lower mpg than expected for their weight.
Changing Point Shapes in geompoint in R
By default, ggplot2 uses a filled circle (shape 19) to represent the points in a scatter plot. However, you can change the shape of the points using the shape argument in the geom_point() function. You can specify any value between 0 and 25 to choose a different point shape, as shown in the figure below:
A table of ggplot shapes (Geom Shapes) with their corresponding values
Value | Shape | Description |
---|---|---|
0 | Empty | |
1 | ○ | Hollow circle |
2 | ◙ | Filled circle |
3 | + | Plus sign |
4 | × | Cross |
5 | □ | Hollow square |
6 | ■ | Filled square |
7 | ◇ | Hollow diamond |
8 | ◆ | Filled diamond |
9 | △ | Hollow triangle (pointing up) |
10 | ▲ | Filled triangle (pointing up) |
11 | ▽ | Hollow triangle (pointing down) |
12 | ▼ | Filled triangle (pointing down) |
13 | ◯ | Circle with a dot inside |
14 | ◉ | Filled circle with a dot inside |
15 | ▢ | Square with a dot inside |
16 | ▣ | Filled square with a dot inside |
17 | ◊ | Diamond with a dot inside |
18 | ◈ | Filled diamond with a dot inside |
19 | ● | Solid circle |
20 | ⊕ | Circle with a plus sign inside |
21 | ⊗ | Circle with a cross inside |
22 | ⊙ | Circle with a star inside |
23 | ⊖ | Circle with a horizontal line inside |
24 | ⊘ | Circle with a vertical line inside |
25 | ⊚ | Circle with a slash inside |
# Change the point shape to a hollow circle ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(shape = 1)
Mapping a Variable to Point Shapes
# Map cyl to the shape of the points ggplot(mtcars, aes(x = wt, y = mpg, shape = as.factor(cyl))) + geom_point()
We can see three levels of cyl: 4, 6, and 8. The plot shows that cars with more cylinders tend to have lower mpg and higher wt and that there is some variation within each group.
Related Posts
Using Different Point Shapes for Different Groups
Sometimes, you may use different point shapes for different data groups but not map a variable to the shape of the points. For example, highlight some outliers or interesting cases in the plot or compare two subsets of data. In this case, you can use the ifelse() function to create a new variable that assigns different point shapes based on some condition.
For example, we want to use a different point shape for cars with more than 200 horsepower. We can create a new variable called high_hp that assigns 1 to these cars and 0 to the rest. Then, we can use the shape argument in the geom_point() function to specify different point shapes for the two groups. We can also use the show.legend argument to hide the legend since we don’t need it in this case.
# Create a new variable that assigns 1 to cars with more than 200 hp and 0 to the rest mtcars$high_hp <- ifelse(mtcars$hp > 200, 1, 0) # Use different point shapes for different groups ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(shape = ifelse(mtcars$high_hp == 1, 4, 19), # Use a cross for high hp cars, and a circle for the rest show.legend = FALSE) # Hide the legend
We can see that five cars have more than 200 hp and are marked with a cross in the plot. These cars tend to have lower mpg and higher weight than the rest.
Changing Point Shape, Color, Size, and Fill
# Change the point color, size, and fill ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(color = "red", # Use red color size = 3, # Use large size fill = NA) # Use no fill
You can also map a variable to the points' color, size, or fill using the aes() function. It will create a different color, size, or fill for each variable level and add a legend to the plot. It can be useful to show the distribution of another variable in the plot or add more information.
For example, if you want to map the horsepower (hp) to the color of the points, you can use the following code:
# Map hp to the color of the points ggplot(mtcars, aes(x = wt, y = mpg, color = hp)) + geom_point()
We can see a gradient of colors from blue to red, corresponding to the range of hp values. The plot shows that cars with higher HP tend to have lower MPG and higher wt and that there is some variation within each group.
You can also map the horsepower (hp) to the size of the points using the size argument in the aes() function. It will create a different size for each variable level and add a legend to the plot. It can be useful to show the variation of another variable in the plot or to emphasize the outliers or extreme values.
# Map hp to the size of the points ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) + geom_point()
We can see a range of sizes from small to large, corresponding to the range of hp values. The plot shows that cars with higher hp tend to have lower mpg and higher wt and that there are some outliers or extreme values in the data.
You can also map the horsepower (hp) to the fill of the points using the fill argument in the aes() function. It will create a different fill for each variable level and add a legend to the plot. It can be useful to show the distribution of another variable in the plot or add more information.
For example, if you want to map the horsepower (hp) to the fill of the points, you can use the following code:
# Map hp to the fill of the points ggplot(mtcars, aes(x = wt, y = mpg, fill = hp)) + geom_point(shape = 21) # Use a shape that has a border and a fill
We can see a fill gradient from blue to red, corresponding to the range of hp values.
Controlling Point Shape with scale_shape_manual() and scale_shape_discrete() in R
You can also control the point shape with the scale_shape_manual() and scale_shape_discrete() functions. These functions allow you to set the values of the point shapes manually or automatically respectively. You can use these functions to change the default point shapes or assign specific point shapes to specific variable levels.
For example, if you want to change the default point shapes to squares, diamonds, and triangles, you can use the scale_shape_manual() function and specify the values argument with a vector of shape values. You can also use the labels argument to change the labels of the legend and the name argument to change the title of the legend.
# Change the default point shapes to squares, diamonds, and triangles ggplot(mtcars, aes(x = wt, y = mpg, shape = cyl)) + geom_point() + scale_shape_manual(values = c(15, 18, 17), # Use squares, diamonds, and triangles labels = c("Four", "Six", "Eight"), # Use custom labels name = "Cylinders") # Use custom title
We can see that the point shapes have changed to squares, diamonds, and triangles, corresponding to the levels of cyl. The legend also reflects the new point shapes, labels, and titles.
Assign specific point shapes to specific variable levels in ggplot shapes in R
Alternatively, if you want to assign specific point shapes to specific variable levels, use the scale_shape_manual() function and specify the values argument with a named vector of shape values. The vectors' names should match the variable's levels, and the values should be the desired shape values.
For example, if you want to assign a plus sign to cars with four cylinders, a circle to cars with six cylinders, and a cross to cars with eight cylinders, you can use the following code:
# Assign specific point shapes to specific levels of cyl ggplot(mtcars, aes(x = wt, y = mpg, shape = as.factor(cyl))) + geom_point() + scale_shape_manual(values = c("4" = 3, "6" = 19, "8" = 4)) # Use a named vector of shape values
We can see that the point shapes have changed to plus signs, circles, and crosses, corresponding to the levels of cyl. The legend has also changed to reflect the new point shapes.
If you don’t want to specify the values of the point shapes manually, you can use the scale_shape_discrete() function instead. This function will automatically assign different point shapes to different levels of a variable based on the order of the levels. You can use the breaks argument to change the order of the levels and the labels and name arguments to change the labels and title of the legend.
For example, if you want to reverse the order of the levels of cyl, you can use the following code:
# Reverse the order of the levels of the cyl ggplot(mtcars, aes(x = wt, y = mpg, shape = as.factor(cyl))) + geom_point() + scale_shape_discrete(breaks = c(8, 6, 4)) # Use the breaks argument to change the order
We can see that the point shapes have changed to reflect the reversed order of the levels of cyl. The legend has also changed to reflect the new order.
Dealing with Overplotting with geom_jitter() and geom_count() in R
Sometimes, you may encounter a problem called overplotting, which occurs when there are too many points in the plot that overlap with each other. It can make it hard to see the data's distribution or identify the individual points. To deal with overplotting, you can use two alternative functions: geom_jitter() and geom_count().
geom_jitter() adds a small amount of random noise to the x and y coordinates of the points so that they are slightly displaced from their original positions. It can help to spread out the points and reduce the overlap. You can control the amount of jitter using the width and height arguments, which specify the range of the noise in the x and y directions, respectively.
# Add some jitter to the points
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_jitter(width = 0.1, height = 0.5) # Use small values for width and height
We can see that the points are slightly moved from their original positions, creating a more dispersed pattern. It can help to see the data's variation and identify the outliers or clusters.
geom_count() adds points with sizes proportional to the number of observations at each position so that larger points indicate more observations. It can help show the data's density and highlight the common or rare values. You can control the size of the points using the size argument, which specifies the scaling factor for the point sizes.
People Also Read:
- Air Quality Index (AQI) and World Future 2025?
- Case Study: Unraveling Russia's War Efforts in Ukraine
- ggplot2: Guide to Data Visualization
Adding Regression Lines and Confidence Intervals with geom_smooth() in R
Another way to enhance the scatter plot is to add a regression line and a confidence interval using the geom_smooth() function in R. It adds a smoothed conditional mean and a confidence band to the plot, which can help to show the trend and the uncertainty of the relationship between the variables. You can control the type of regression model using the method argument and the level of confidence using the level argument.
For example, if you want to add a linear regression line and a 95% confidence interval to the scatter plot of mpg vs wt, you can use the following code:
# Add a linear regression line and a 95% confidence interval ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_smooth(method = "lm", # Use linear regression level = 0.95) # Use a 95% confidence level
We can see that the plot has a blue line and a gray band, corresponding to the linear regression line and the 95% confidence interval, respectively. The plot shows a strong negative relationship between mpg and wt, and the confidence band is narrow, indicating a high degree of certainty.
You can also use other regression models, such as loess, a non-parametric method that fits a smooth curve to the data. You can use the method argument to specify the model type and the span argument to control the degree of smoothing.
For example, if you want to use a loess model with a span of 0.5 to the scatter plot of mpg vs wt, you can use the following code:
# Use a loess model with a span of 0.5 ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_smooth(method = "loess", # Use loess span = 0.5) # Use a moderate degree of smoothing
Related Posts
Labeling Points with geom_text() and geom_label() in R
Another way to enhance the scatter plot is to label the points with text using the geom_text() or geom_label() functions. These functions add text labels to the plot, which can help to identify the individual points or to add more information to the plot. You can control the text of the labels using the label argument and the position of the labels using the hjust and vjust arguments, which specify the horizontal and vertical justification of the labels, respectively.
For example, if you want to label the points with the names of the cars, you can use the following code:
# Label the points with the names of the cars ggplot(mtcars, aes(x = wt, y = mpg, label = rownames(mtcars))) + geom_point() + geom_text(hjust = 1.2, vjust = -0.2) # Adjust the position of the labels
We can see that the plot has text labels for the names of the cars, which are slightly moved from the points to avoid overlap. The plot shows some cars with unique names, such as Mazda RX4, Datsun 710, and Ferrari Dino.
You can also use the geom_label() function instead of the geom_text() function, which adds text labels with a background color to the plot. It can help to make the labels more visible and readable. You can control the color and fill of the labels using the color and fill arguments, respectively.
For example, if you want to use geom_label() to label the points with the names of the cars, you can use the following code:
# Use geom_label() to label the points with the names of the cars ggplot(mtcars, aes(x = wt, y = mpg, label = rownames(mtcars))) + geom_point() + geom_label(color = "black", # Use black color for the border fill = "white", # Use white color for the background hjust = 1.2, vjust = -0.2) # Adjust the position of the labels
We can see that the plot has text labels with a background color for the names of the cars, which are slightly moved from the points to avoid overlap. The plot shows that the labels are more visible and readable than the previous plot.
# Use geom_text() to label the points with the names of the cars and rotate the text ggplot(mtcars, aes(x = wt, y = mpg, label = rownames(mtcars))) + geom_point() + geom_text(color = "black", # Use black color for the border hjust = 1.2, vjust = -0.2, # Adjust the position of the labels angle = 90) # Rotate the text by 45 degrees
Conclusion
- Make a basic scatter plot with ggplot2
- Change the shape of the points using the shape argument
- Map a variable to the shape of the points using aes()
- Use different point shapes for different groups of data
- Change the shape, color, size, and fill of the points manually or automatically
- Use scale_shape_manual() and scale_shape_discrete() to control the shape of the points
- Deal with overplotting with geom_jitter() and geom_count()
- Add regression lines and confidence intervals with geom_smooth()
- Label the points with geom_text() and geom_label()
Freqeuntly Asked Question (FAQs)
What are the geometric shapes in ggplot2?
Geometric shapes in ggplot2 are symbols that represent the points, lines, bars, and other elements of a plot. They are specified by the geom_ functions, such as:
- geom_point()
- geom_line()
- geom_bar()
What is the default shape size in ggplot?
The default shape size in ggplot is 1.5 mm. You can change the size of the shape using the size argument in the geom_ function. For example, to make the points larger on a scatter plot, you can use geom_point(size = value), where value is a positive numeric value. You can also map a variable to the shape's size using the aes() function. For example, you can use geom_point(aes(size = variable)) to make the points proportional to a variable.
What are the geometries in ggplot2?
How to add shapes to ggplot?
To add shapes to a ggplot, you can use the geom_ functions, such as geom_point(), geom_line(), geom_bar(), etc. Each geom_ function adds a layer of shapes to the plot and can be combined to create complex plots. For example, to add points and a regression line to a scatter plot, you can use geom_point() and geom_smooth(). You can also use the aes() function to map variables to the aesthetic attributes of the shapes, such as colour, size, fill, etc. For example, to map a variable to the colour of the points, you can use geom_point(aes(color = variable)).
What are geometric objects in R?
Geometric objects in R are the basic shapes that can be drawn on a graphical device, such as points, lines, polygons, circles, etc. They are specified by the low-level graphics functions, such as points(), lines(), polygon(), symbols(), etc. Each function has a set of parameters that can be used to customize the appearance and position of the object. For example, points() can use different point characters, such as pch = 1 for a hollow circle, pch = 2 for a plus sign, pch = 3 for a cross, etc.
How do I manually specify shapes in ggplot?
To manually specify shapes in ggplot, you can use the shape argument in the geom_ function and set it to a constant value or a named vector of values. For example, to manually specify the point shapes on a scatter plot, you can use geom_point(shape = value), where value is a number between 0 and 25, or a named vector of numbers that match the levels of a variable.
You can also use the scale_shape_manual() function to manually set the values of the point shapes and use the labels and name arguments to change the labels and title of the legend. For example, to manually specify the point shapes and labels for a variable, you can use scale_shape_manual(values = c("a" = 1, "b" = 2, "c" = 3), labels = c("Alpha", "Beta", "Gamma"), name = "Group").
What is the Geom_point function in R?
R's geom_point() function is a ggplot2 function that adds points to a plot. It is used to create scatter plots showing the relationship between two continuous variables. It has a set of arguments that can be used to customize the appearance and behaviour of the points, such as shape, colour, size, fill, etc. It also has an aes() argument that can be used to map variables to the aesthetic attributes of the points, such as colour, size, fill, etc. For example, to create a scatter plot of mpg vs wt and map the number of cylinders to the colour of the points, you can use geom_point(aes(x = wt, y = mpg, color = cyl)).
How to plot curves in ggplot?
What is the difference between plot and ggplot?
- Plot is a base R function that creates simple plots, such as scatter plots, bar plots, box plots, etc. It has a set of parameters that can be used to customize the appearance and position of the plot, such as main, xlab, ylab, xlim, ylim, etc.
- ggplot is a ggplot2 function that creates complex plots based on the grammar of graphics. It has a set of arguments that can specify the data, the mapping of variables to aesthetics, the geometric objects, the statistical transformations, the scales, the coordinates, the facets, the themes, etc. For example, to create a scatter plot of mpg vs wt and add a regression line and a confidence interval, you can use plot(mpg ~ wt, data = mtcars) for the base R function, or ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_smooth(method = "lm") for the ggplot2 function.
What geometric shapes start with R?
- Rectangle: A four-sided polygon with four right angles and opposite sides equal and parallel.
- Rhombus: A four-sided polygon with four equal sides and opposite angles equal.
- Right triangle: A triangle with one right angle and two acute angles.
- Regular polygon: A polygon with all sides and angles equal.
- Radius: A line segment that joins the center of a circle to any point on the circle.
- Ray: A part of a line that has one endpoint and extends infinitely in one direction.
- Revolution: A three-dimensional shape that is formed by rotating a two-dimensional shape around an axis.
- Reuleaux triangle: A curve of constant width formed by the intersection of three circles with equal radius and centers on the vertices of an equilateral triangle.
- Rhomboid: A parallelogram that is not a rectangle or a rhombus.
- Right prism: A prism whose bases are polygons and lateral faces are rectangles.
What are five geometric shapes examples?
- Circle: A two-dimensional shape formed by the set of all points equidistant from a fixed point called the center.
- Square: A four-sided polygon with four right angles and four equal sides.
- Cube: A three-dimensional shape with six square faces, eight vertices, and twelve edges.
- Sphere: A three-dimensional shape formed by the set of all points equidistant from a fixed point called the center.
- Pyramid: A three-dimensional shape with a polygonal base and triangular faces that meet at a common vertex called the apex.
What is a 3D shape called?
A 3D shape is called a solid or a polyhedron. A solid is a three-dimensional shape with length, width, and height. A polyhedron is a solid with flat faces, straight edges, and sharp corners. Some examples of polyhedra are cubes, prisms, pyramids, cones, cylinders, and spheres.
What is the difference between shape and geometry?
- Shape is the form or outline of an object, such as a circle, square, triangle, etc.
- Geometry is the branch of mathematics that studies the properties, measurements, and relationships of shapes, such as angles, area, volume, symmetry, etc. Geometry can be used to create, analyze, and classify shapes and to understand the patterns and structures of the natural and human-made world. For example, geometry can help us to design buildings, bridges, maps, art, and puzzles.
How to change shape on ggplot?
To change the shape of an element on a ggplot, you can use the shape argument in the corresponding geom_ function. For example, to change the shape of the points on a scatter plot, you can use geom_point(shape = value), where value is a number between 0 and 25, or a variable name that maps to the shape. You can also use scale_shape_manual() or scale_shape_discrete() to control the shape of the points manually or automatically, respectively.
Need a Customized solution for your data analysis projects? Are you interested in learning through Zoom? Hire me as your data analyst. I have five years of experience and a PhD. I can help you with data analysis projects and problems using R and other tools. You can visit this link and fill out the order form to hire me. You can also contact me at info@rstudiodatalab.com for any questions or inquiries. I will be happy to work with you and provide you with high-quality data analysis services.