API¶
Import epiScanpy’s high-level API as:
import episcanpy.api as epi
Count Matrices: CT¶
Loading data, loading annotations, building count matrices, filtering of lowly covered methylation variables. Filtering of lowly covered cells.
Building count matrices¶
Quickly build a count matrix from tsv/tbi file.
|
Building count matrix on the fly. |
Load features¶
In order to build a count matrix for either methylation or open chromatin data, loading the segmentation of the genome of interest or the set of features of interest is a prerequirement.
|
The function load features is here to transform a bed file into a usable set of units to measure methylation levels. |
|
Generate windows/bins of the given size for the appropriate genome (default choice is human). |
|
If the features loaded are too smalls or of different sizes, it is possible to normalise them to a unique given size by extending the feature coordinate in both directions. |
|
Plot the different feature sizes in an histogram. |
|
Extract the names of the loaded features, specifying the chromosome they originated from. |
Reading methylation file¶
Functions to read methylation files, extract methylation and buildthe count matrices:
|
Build methylation count matrix for a given annotation. |
|
Read file from which you want to extract the methylation level and (assuming it is like the Ecker/Methylpy format) extract the number of methylated read and the total number of read for the cytosines covered and in the right genomic context (CG or CH) :param sample_name: name of the file to read to extract key information. |
|
read the raw count matrix and convert it into an AnnData object. |
Reading open chromatin(ATAC) file¶
ATAC-seq specific functions to build count matrices and load data:
|
Building count matrix on the fly. |
|
Convert regular atac matrix into a sparse Anndata: |
General functions¶
Functions non -omic specific:
|
Convert regular atac matrix into a sparse Anndata: |
Preprocessing: PP¶
Imputing missing data (methylation), filtering lowly covered cells or variables, correction for batch effect.
|
Histogram of the number of open features (in the case of ATAC-seq data) per cell. |
|
Display how often a feature is measured as open (for ATAC-seq). |
|
Correlation between a given PC and a covariate. |
|
Display how often a feature is measured as open (for ATAC-seq). |
|
Display how often a feature is measured as open (for ATAC-seq). |
|
This function computes a variability score to rank the most variable features across all cells. |
|
Show distribution plots of cells sharing features and variability score. |
|
This function computes a variability score to rank the most variable features across all cells. |
|
convert the count matrix into a binary matrix. |
|
Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP) |
|
Load observational metadata in adata.obs. |
|
Load sparse matrix (including matrices corresponding to 10x data) as AnnData objects. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter features based on number of cells or counts. |
|
Normalize counts per cell. |
|
Principal component analysis [Pedregosa11]. |
|
Normalize total counts per cell. |
|
Regress out unwanted sources of variation. |
|
Subsample to a fraction of the number of observations. |
|
Downsample counts from count matrix. |
|
Compute a neighborhood graph of observations [McInnes18]. |
|
Transform adata.X from a matrix or array to a sparse matrix. |
|
Logarithmize the data matrix. |
|
Transform adata.X from a matrix or array to a sparse matrix. |
Methylation matrices¶
Methylation specific count matrices.
|
Impute missing values in methyaltion level matrices. |
|
read the raw count matrix and convert it into an AnnData object. |
|
Temporary function to load and impute methyaltion count matrix into an AnnData object |
Tools: TL¶
|
It is a wrap-up function of scanpy sc.tl.rank_genes_groups function. |
|
Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP) |
|
Convert list of known cell type markers from literature to a dictionary Input list of known marker genes First row is considered the header |
|
Use markers of a given cell type to plot peak openness for peaks in promoters of the given markers Input cell type, cell type markers, peak promoter intersections |
|
Deprecated - Please use epi.tl.var_features_to_genes instead. |
|
Once you called the most variable features. |
|
Build an AnnData object containing the number of open features (windows, peaks, etc) overlapping genes (gene bodies + 5kb upstream of the TSS). |
|
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18]. |
|
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18]. |
|
t-SNE [Maaten08] [Amir13] [Pedregosa11]. |
|
Embed the neighborhood graph using UMAP [McInnes18]. |
|
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19]. |
|
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. |
|
Cluster cells into subgroups [Traag18]. |
|
Compute kmeans clustering using X_pca fits. |
|
Compute hierarchical clustering using X_pca fits. |
|
Function will test different settings of louvain to obtain the target number of clusters. |
|
Computes a hierarchical clustering for the given groupby categories. |
|
Compute Adjusted Rand Index. |
|
Compute adjusted Mutual Info. |
|
Compute homogeneity score. |
|
Compute silhouette scores. |
Plotting: PL¶
The plotting module episcanpy.plotting
largely parallels the tl.*
and a few of the pp.*
functions.
For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
|
Scatter plot in PCA coordinates. |
|
Rank features according to contributions to PCs. |
|
Plot PCA results. |
|
Plot the variance ratio. |
|
Scatter plot in tSNE basis. |
|
Scatter plot in UMAP basis. |
|
Scatter plot in Diffusion Map basis. |
|
Scatter plot in graph-drawing basis. |
|
Plot ranking of features. |
|
Plot ranking of features for all tested comparisons. |
|
Plot ranking of features using dotplot plot (see |
|
Plot ranking of features using stacked_violin plot (see |
|
Plot ranking of features using matrixplot plot (see |
|
Plot ranking of features using heatmap plot (see |
|
Plot ranking of features using heatmap plot (see |
|
Show distribution plots of cells sharing features and variability score. |
|
Violin plot. |
|
Scatter plot along observations or variables axes. |
|
Plot rankings. |
|
Hierarchically-clustered heatmap. |
|
Stacked violin plots. |
|
Heatmap of the expression values of genes. |
|
Makes a dot plot of the expression values of var_names. |
|
Creates a heatmap of the mean expression values per group of each var_names. |
|
In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells. |
|
Plots a dendrogram of the categories defined in groupby. |
|
Plots the correlation matrix computed as part of sc.tl.dendrogram. |
|
% or cell count corresponding to the overlap of different cell types between 2 set of annotations/clusters. |
|
Heatmap of the cluster correspondance between 2 set of annaotations. |
|
|
|
Plot the product of tl.silhouette as a silhouette plot |
|
Both compute silhouette scores and plot it. |
|
Show distribution plots of cells sharing features and variability score. |
|
This function computes a variability score to rank the most variable features across all cells. |