Skip to content

SparseOptimization pattern discrepancy  #77

@rpalaganas

Description

@rpalaganas

Good afternoon! I recently ran into an issue where there is pattern discrepancy between runs with sparseOptimization set to TRUE versus FALSE. The code I ran and the output is below. With sparseOptimization set to TRUE I noticed that the ChiSq value was -nan and during the equilibration phase, the P matrix was 0. With sparseOptimization set to FALSE there seemed to be no problems, however the number of patterns learned differed in either case, i.e. SparseOptimization = TRUE gave 5 patterns while SparseOptimization = FALSE gave 6 patterns. This was true for a range of patterns that I ran (5-50)

SPARSE OPTIMIZATION ENABLED

params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42, 
sparseOptimization=TRUE,
distributed="genome-wide")

params <- setDistributedParams(params, nSets=6)

Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)

This is CoGAPS version 3.19.1 
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:

-- Standard Parameters --
nPatterns            5 
nIterations          30000 
seed                 42 
sparseOptimization   TRUE 
distributed          genome-wide 

-- Sparsity Parameters --
alpha          0.01 
maxGibbsMass   100 

-- Distributed CoGAPS Parameters -- 
nSets          6 
cut            5 
minNS          3 
maxNS          9 

Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...

Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 2 is starting!
    worker 4 is starting!
    worker 6 is starting!
    worker 3 is starting!
    worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 13376(A), 1242(P), ChiSq: -nan, Time: 00:00:45 / 01:16:13
...
30000 of 30000, Atoms: 20636(A), 1461(P), ChiSq: -nan, Time: 00:35:40 / 01:16:38
-- Sampling Phase --
1000 of 30000, Atoms: 20671(A), 1460(P), ChiSq: -nan, Time: 00:36:54 / 01:16:28
...
29000 of 30000, Atoms: 20645(A), 1469(P), ChiSq: -nan, Time: 01:12:07 / 01:13:27
    worker 2 is finished! Time: 01:12:22
30000 of 30000, Atoms: 20670(A), 1484(P), ChiSq: -nan, Time: 01:13:21 / 01:13:21
    worker 1 is finished! Time: 01:13:21
    worker 3 is finished! Time: 01:13:24
    worker 5 is finished! Time: 01:15:26
    worker 4 is finished! Time: 01:15:26
    worker 6 is finished! Time: 01:19:08

Matching Patterns Across Subsets...
Running Final Stage...

Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 2 is starting!
    worker 6 is starting!
    worker 4 is starting!
    worker 3 is starting!
    worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 10022(A), 0(P), ChiSq: -nan, Time: 00:00:27 / 00:45:43
...
30000 of 30000, Atoms: 15174(A), 0(P), ChiSq: -nan, Time: 00:47:13 / 00:47:13
    worker 1 is finished! Time: 00:47:13
    worker 2 is finished! Time: 00:47:28
    worker 5 is finished! Time: 00:47:34
Warning message:
In checkInputs(data, uncertainty, allParams) :
  running distributed cogaps without mtx/tsv/csv/gct data

SPARSE OPTIMIZATION DISABLED

params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42,
distributed="genome-wide")

params <- setDistributedParams(params, nSets=6)

Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)

This is CoGAPS version 3.19.1 
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:

-- Standard Parameters --
nPatterns            5 
nIterations          30000 
seed                 42 
sparseOptimization   FALSE 
distributed          genome-wide 

-- Sparsity Parameters --
alpha          0.01 
maxGibbsMass   100 

-- Distributed CoGAPS Parameters -- 
nSets          6 
cut            5 
minNS          3 
maxNS          9 

Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...

    worker 2 is starting!
    worker 3 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 4 is starting!
    worker 5 is starting!
    worker 6 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 4665(A), 966(P), ChiSq: 5137063, Time: 00:01:16 / 02:08:43
...
30000 of 30000, Atoms: 9933(A), 2460(P), ChiSq: 4886798, Time: 00:49:52 / 01:47:09
-- Sampling Phase --
1000 of 30000, Atoms: 10033(A), 2514(P), ChiSq: 4886740, Time: 00:51:31 / 01:46:45
...
30000 of 30000, Atoms: 9953(A), 2489(P), ChiSq: 4886819, Time: 01:34:05 / 01:34:05
    worker 1 is finished! Time: 01:34:05
    worker 5 is finished! Time: 01:44:52
    worker 4 is finished! Time: 01:54:06
    worker 2 is finished! Time: 01:54:29
    worker 6 is finished! Time: 01:54:31
    worker 3 is finished! Time: 01:54:38

Matching Patterns Across Subsets...
Running Final Stage...

    worker 5 is starting!
    worker 4 is starting!
    worker 3 is starting!
    worker 2 is starting!
    worker 6 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 5928(A), 0(P), ChiSq: 14908930, Time: 00:00:10 / 00:16:56
...
30000 of 30000, Atoms: 10469(A), 0(P), ChiSq: 14908930, Time: 00:08:47 / 00:18:52
-- Sampling Phase --
1000 of 30000, Atoms: 10403(A), 0(P), ChiSq: 14908930, Time: 00:09:00 / 00:18:39
...
30000 of 30000, Atoms: 10379(A), 0(P), ChiSq: 14908930, Time: 00:15:17 / 00:15:17
    worker 1 is finished! Time: 00:15:17
    worker 5 is finished! Time: 00:16:29
    worker 3 is finished! Time: 00:19:47
    worker 2 is finished! Time: 00:20:37
    worker 4 is finished! Time: 00:20:38
    worker 6 is finished! Time: 00:20:45
Warning message:
In checkInputs(data, uncertainty, allParams) :
  running distributed cogaps without mtx/tsv/csv/gct data

After obtaining the patterns, I ran patternMarkers on patterns learned with sparseOptimization = TRUE. When I set threshold = “all”, I would get this error.

test <- patternMarkers_all(Hoxd10_matnp5, threshold = "all")

Error in colnames(markerScores)[apply(markerScores, 1, which.min)] : 
  invalid subscript type 'list'
This error would not trigger when threshold was set to “cut”.
PatternMarkers worked normally when run on patterns learned without sparseOptimization. 

UPDATE @dimalvovs  - delete rows for readability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions