data:image/s3,"s3://crabby-images/f5068/f50688f853c6ec4ab58d7ac309d2bb40bcf8b6ce" alt="Dplyr summarize ignore na"
data:image/s3,"s3://crabby-images/4a034/4a034decdc2bcb5d9703ebd2f2cd622e7ce683a4" alt="dplyr summarize ignore na dplyr summarize ignore na"
It looks like %in% is the winner - after replacing empty values (" ") with NAs. Regex_stringi = df %>% mutate(new = pmap_chr(., ~stringi::stri_flatten( # grepl after stringi Regex_row_iteration = df %>% # grepl search after iterating over rows (using syntax I'm not familiar with and need to learn!) Mutate(new = apply(df, 1, function(x) toString(na.omit(x)))) %>% Regex_toString = tidyr::unite(df, new, cols, sep = ",") %>% # grepl search with NAs removed with `apply()` & `toString()` R dplyr: dealing with NA values and empty/missing rows when summarizing data by group 0 Cant run dplyr::summarize function with descriptive functions : 'missing values and NaNs not allowed if na. Regex_str_replace_all = tidyr::unite(df, new, cols, sep = ",") %>% # grepl search with NAs removed with `stringr::str_replace_all()`
data:image/s3,"s3://crabby-images/f0acb/f0acb244a15cce6149a06b79d18d86ccce41bca9" alt="dplyr summarize ignore na dplyr summarize ignore na"
# search by base R 'grep()' function - the same regex is used in each case Library(biometrics) # has my helper function for column selection Thanks all, I've put together a summary of the solutions and bench-marked on my data: library(microbenchmark) Unfortunately, there doesn't appear to be a great vectorized alternative for removing NAs before joining the strings.
#Dplyr summarize ignore na code#
Consider the R code and its output below: datagroupNA <- data, lapply (.SD, mean), Summarize data.table by group by group datagroupNA Print summarized data.table. This example demonstrates what happens when we do not actively avoid NA values when summarizing a data.table in R. The problem is that iterating over rows usually entails making a lot of calls, and can therefore be quite slow at scale. Example 1: Summarize data.table without Removing NA. Or using tidyr's underlying stringi package, df %>% mutate(x = pmap_chr(., ~stringi::stri_flatten( For example, if we take the data from the original post and convert it to a pipe separated values file, we can use na.strings() to include n/a as a missing value with read.csv(), and then use na. + summarise_each(funs(nmiss = sum(is.na())))Įrror in is.na() : 0 arguments passed to 'is.na' which requires 1Īny advice or pointers to documentation would be very gratefully received.You can avoid inserting them by iterating over the rows: library(tidyverse)ĭf % mutate(x = pmap_chr(., ~paste(na.omit(c(.)), collapse = ','))) The n/a values can also be converted to values that work with na.omit() when the data is read into R by use of the na.strings() argument. + group_by(grp) %>% # This is replaced with regroup() in my function I tried calling is.na() explicitly with the brackets but that too returns an error. > t %>%Įrror in sum(.Primitive("is.na")) : invalid 'type' (builtin) of argument Summarise_each(funs(propmiss = sum(is.na) / length))īut the problem is that sum(is.na) doesn't work as I expect it to (likely because my expectation is wrong!).
data:image/s3,"s3://crabby-images/742d1/742d1908fbf413f09b5a99446907d2c0c505d350" alt="dplyr summarize ignore na dplyr summarize ignore na"
The real problem though is that I do not need just is.na() within each column, but the sum(is.na()) as per the linked example so what I really would like is. Running this fails, and its the call to is.na() that is the problem since if I instead work out the number of observations in each (required to derive the proportion of missing) it works. Group_by(grp) %>% # This is replaced with regroup() in my function I'm purposefully not posting my function just yet but a minimal example follows (NB - This uses group_by() whilst in my function I replace this with regroup()). The problem I've encountered (so far) is with calling is.na() from within summarise_each(funs(is.na)) as I'm told Error: expecting a single value. The ultimate aim is to have a dplyr version of this working, and reading around I came across the very useful summarise_each() function which after subsetting with regroup() (since this is within a function) I can then use to get all columns parsed. I'm trying to wrap some dplyr magic inside a function to produce a ame that I then print with xtable.
data:image/s3,"s3://crabby-images/f5068/f50688f853c6ec4ab58d7ac309d2bb40bcf8b6ce" alt="Dplyr summarize ignore na"