10  Complexity and Entropy Measures

11 Overview

This project seeks to characterize malignant transformation not only through differential expression, but through changes in the structure of the transcriptomic system.

Two complementary classes of measures are used:

  • Complexity measures, which quantify structural properties of the expression matrix
  • Entropy measures, which quantify disorder and distributional uncertainty

Together, these provide a multi-perspective view of how biological organization changes as tissue transitions from normal to tumor.

Importantly, no single scalar definition of “complexity” is assumed. Instead, complexity is treated as a family of structural observables, each capturing a different aspect of transcriptomic organization.


12 Complexity Measures

12.1 Conceptual Definition

In this pipeline, complexity is defined in terms of the geometry and conditioning of the expression matrix.

Rather than focusing on mean expression changes, complexity is interpreted as:

  • how uneven or anisotropic the system is
  • how many effective dimensions are active
  • how stable or ill-conditioned the system is

12.2 Implemented Metrics

The following metrics quantify different structural aspects of the expression matrix. No single metric is definitive; together they provide a composite view of transcriptomic organization.

12.3 Complexity Metrics

Metric What It Measures Interpretation in This Framework
Covariance condition number (κ₁) Conditioning of the covariance matrix Reflects sensitivity and anisotropy of variance structure; high values indicate uneven variance distribution across directions.
2-norm condition number (κ₂) Matrix conditioning under the 2-norm Provides an alternative measure of numerical conditioning; interpreted as a supporting diagnostic rather than a primary metric.
SVD condition number (κ₃) Ratio of largest to smallest singular values Primary complexity metric; captures the degree to which expression space is dominated by a small number of directions (anisotropy).
Effective rank Entropy-based dimensionality of eigenvalue spectrum Estimates how many dimensions effectively contribute to variance; higher values indicate more distributed structure.
Matrix sparsity Proportion of near-zero entries Describes distribution of expression magnitudes; included as a structural descriptor with limited direct biological interpretation.
Composite κ Average of multiple condition metrics Provides a robustness check across related formulations; used as a secondary summary rather than a primary measure.

12.4 Inference for Complexity

Complexity differences between normal and tumor are evaluated using both resampling-based inference and distributional comparisons.

12.5 Complexity Analysis Functions

Function Purpose Interpretation Notes
bootstrap_kappa_ci Computes bootstrap confidence intervals for complexity metrics Assesses stability of complexity estimates; wide intervals may reflect heterogeneity or limited sample size.
kappa_per_sample Computes complexity metric per sample Included as an exploratory diagnostic; interpretation is limited due to dimensional constraints of single-sample vectors.
compare_kappa_distributions Compares distributions between groups Evaluates differences in central tendency or distribution; used as a descriptive complement to permutation testing.
permutation_test_complexity Permutation-based test of group differences Primary inferential method; assesses whether observed complexity differences exceed random label assignment.

The statistical procedures below are used for both complexity and entropy analyses unless otherwise noted.

12.6 Statistical Tests

Statistic Method What It Evaluates Role in This Pipeline
p_perm Permutation test Whether observed differences arise under random label assignment Primary inferential statistic; does not rely on distributional assumptions.
p_wilcox Wilcoxon rank-sum test Differences in rank distributions between groups Secondary, descriptive comparison; useful for non-normal data but not primary inference.
p_ks Kolmogorov–Smirnov test Differences in full distributions (shape, location, spread) Exploratory; sensitive to multiple aspects of distributional change.

12.7 Important Cautions

  • There is no unique definition of complexity; different metrics capture different structural aspects.
  • Per-sample complexity estimates are included for exploratory purposes and should be interpreted cautiously.
  • High condition numbers reflect anisotropy or dominance in expression space and should not be interpreted as biological “instability” without context.

13 Entropy Measures

13.1 Conceptual Definition

Entropy measures quantify uncertainty and disorder in the transcriptomic system.

They provide a complementary perspective to complexity by focusing on dispersion rather than structure.


13.2 Implemented Metrics

The following entropy measures capture different aspects of transcriptomic variability and organization.

13.3 Entropy Metrics

Metric What It Measures Interpretation in This Framework
Shannon entropy Distributional uncertainty in expression values Reflects overall dispersion of expression magnitudes; higher values indicate greater uncertainty or variability.
Spectral entropy Entropy of covariance eigenvalue distribution Captures how variance is distributed across dimensions; higher values indicate more evenly distributed variance.
Entropy direction Categorization of entropy change Provides qualitative labels (e.g., chaotic vs. anti-chaotic) based on magnitude and direction of entropy differences; interpretive rather than strictly statistical.
Permutation test Significance of entropy differences Evaluates whether entropy changes exceed random expectation; typically applied using Shannon entropy.

13.4 Inference for Entropy

Entropy results are summarized using both quantitative differences and derived interpretive labels.

13.5 Entropy Summary Measures (Derived)

Measure Description Interpretation Notes
Directional entropy label Categorical classification (e.g., strongly chaotic, neutral) Based on thresholded differences; heuristic and intended for interpretive support.
Permutation-weighted entropy score Combines magnitude and statistical support Provides a confidence-adjusted summary; derived quantity rather than a direct metric.

13.6 Important Cautions

  • Shannon and spectral entropy measure different properties and may not agree.
  • Permutation testing is primarily tied to Shannon entropy in the current implementation.
  • Directional labels such as “chaotic” or “anti-chaotic” are interpretive constructs and should not be treated as strict biological classifications.

14 Relationship Between Complexity and Entropy

Complexity and entropy are complementary but distinct.

Possible patterns include:

  • increased complexity with decreased entropy
  • decreased complexity with increased entropy
  • concurrent increases or decreases

These combinations reflect different modes of transcriptomic organization and should be interpreted jointly rather than independently.


15 Summary

This framework treats cancer as a transformation of transcriptomic structure rather than solely changes in expression levels.

By combining:

  • structural complexity measures
  • entropy-based disorder measures
  • and statistical inference

it provides a multi-dimensional view of malignant transformation across tissues.