Simplify binary matrix to one column per patient that counts any alteration type across all samples as 1
Source:R/summarize-by-patient.R
summarize_by_patient.Rd
This will reduce the number of columns in your binary matrix, and the resulting data frame will have only 1 col per gene, as opposed to separate columns for mutation/cna/fusion.
Details
Note that if samples to the same patient were sequenced on different panels, any indication of an alteration is counted as an alteration, but the absence of an alteration is only defined when all sequencing panels included the gene and indicated that it was not altered.
Examples
samples <- unique(gnomeR::mutations$sampleId)[1:10]
gene_binary <- create_gene_binary(
samples = samples, mutation = mutations, cna = cna,
mut_type = "somatic_only",
include_silent = FALSE,
specify_panel = "IMPACT341")
gene_binary$patient_id = extract_patient_id(gene_binary$sample_id)
summarize_by_patient(gene_binary)
#> # A tibble: 9 × 36
#> patient_id PARP1 AKT1 ALK APC BRCA2 CTNNB1 EPHB1 FAT1 JAK1 SMAD2 NF1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 P-0001128 1 1 0 0 1 1 0 0 0 0 0
#> 2 P-0001859 1 0 0 1 0 0 1 0 0 0 0
#> 3 P-0001895 1 0 1 0 0 0 0 0 0 0 0
#> 4 P-0001845 0 1 0 0 0 0 0 0 0 0 1
#> 5 P-0005570 0 1 0 0 0 0 0 0 0 0 0
#> 6 P-0001768 0 0 1 0 0 0 0 0 0 0 0
#> 7 P-0004508 0 0 1 0 0 0 0 0 0 0 0
#> 8 P-0002984 0 0 1 0 0 0 0 0 1 0 0
#> 9 P-0000964 0 0 0 1 1 0 0 1 0 1 0
#> # ℹ 24 more variables: PDGFRA <dbl>, PIK3R2 <dbl>, PPP2R1A <dbl>, ROS1 <dbl>,
#> # TP53 <dbl>, KMT2D <dbl>, SPOP <dbl>, PIK3R3 <dbl>, IRS2 <dbl>, SPEN <dbl>,
#> # ASXL2 <dbl>, KMT2C <dbl>, CARD11 <dbl>, PTPRS <dbl>, KDM6A <dbl>,
#> # `NKX3-1` <dbl>, AR <dbl>, TRAF7 <dbl>, TSC2 <dbl>, FGFR1 <dbl>,
#> # SOX17 <dbl>, RECQL4 <dbl>, NBN <dbl>, MYC <dbl>