We dont fix data errors
But
Transform your data into actionable insights.
I use my experience to turn your data into clear, useful insights that help you make better decisions.
Hire me today for results that matter!
Our Services
Explore our Data Analysis Service.
Data Preprocessing
Optimize your datasets by leveraging RStudio, SPSS, Minitab, and Excel for seamless data cleaning and transformation. Ensure data integrity with automated outlier detection and missing value imputation techniques.
Dynamic Data Visualization
Transform raw data into insightful visuals using advanced graphing libraries and interactive dashboards. Gain immediate clarity with intuitive graphs, heat maps, and trend lines tailored for both technical and non-technical audiences.
Descriptive Statistics & Correlation Analysis
Uncover key trends through detailed statistical summaries and correlation matrices. Utilize robust analytical tools to summarize central tendencies and variability, paving way for informed decision making.
Hypothesis Testing
Validate your assumptions with precise hypothesis tests. Apply t-tests, chi-square tests, ANOVA, and more to confirm statistical significance, ensuring your conclusions are both reliable and actionable.
Regression & Machine Learning Techniques
Elevate predictive analytics with cutting-edge regression models and machine learning algorithms. From linear regression to classification and clustering, harness data-driven insights to forecast trends and optimize strategies.
Insightful Report Writing
Convey your findings effectively with professionally writting reports. Combine narrative storytelling with visual analytics to deliver comprehensive insights and strategic recommendations tailored to your audience.
3 comments
-
Hello everyone, i need your help to resolve that code after many times.
what i want to do is to write a custom function numeric_stats that takes a data frame and returns the minimum, maximum, mean, and median for all numeric columns. Use this function and sapply to create a data frame numeric_diamonds_stats for groups by 'cut' and 'color' in the diamonds dataset.
it has to be in a specific structure as i added.
and i have to use the function group_by and summarise each.
Thank you! 🙂-
# Load necessary libraries
library(dplyr)
# Define the function to be applied to each column
your_function <- function(x) {
# Return the average if the column contains numeric data, otherwise return NA
if(is.numeric(x)) {
return(mean(x, na.rm = TRUE))
} else {
return(NA)
}
}
# Define the dummy dataset
employee_data <- data.frame(
employee_id = 1:100,
department = sample(c("HR", "Finance", "IT", "Marketing"), 100, replace = TRUE),
age = sample(22:60, 100, replace = TRUE),
salary = runif(100, min = 30000, max = 100000)
)
# Grouping variables
grouping_vars <- c("department")
# Columns to exclude
columns_to_exclude <- c("employee_id")
# Use the dummy dataset
employee_data %>%
group_by(across(all_of(grouping_vars))) %>%
summarize(across(
.cols = -all_of(columns_to_exclude), # Exclude specified columns
.fns = list(your_function),
.names = "avg_{.col}"
), .groups = "drop")
-
-