API¶

Import epiScanpy’s high-level API as:

import episcanpy.api as epi

Count Matrices: CT¶

Loading data, loading annotations, building count matrices, filtering of lowly covered methylation variables. Filtering of lowly covered cells.

Building count matrices¶

Quickly build a count matrix from tsv/tbi file.

ct.bld_mtx_fly(tsv_file, annotation[, …])

Building count matrix on the fly.

Load features¶

In order to build a count matrix for either methylation or open chromatin data, loading the segmentation of the genome of interest or the set of features of interest is a prerequirement.

`ct.load_features`(file_features[, …])	The function load features is here to transform a bed file into a usable set of units to measure methylation levels.
`ct.make_windows`(size[, chromosomes, …])	Generate windows/bins of the given size for the appropriate genome (default choice is human).
`ct.size_feature_norm`(loaded_feature, size)	If the features loaded are too smalls or of different sizes, it is possible to normalise them to a unique given size by extending the feature coordinate in both directions.
`ct.plot_size_features`(loaded_feature[, …])	Plot the different feature sizes in an histogram.
`ct.name_features`(loaded_features)	Extract the names of the loaded features, specifying the chromosome they originated from.

Reading methylation file¶

Functions to read methylation files, extract methylation and buildthe count matrices:

`ct.build_count_mtx`(cells, annotation[, …])	Build methylation count matrix for a given annotation.
`ct.read_cyt_summary`(sample_name, meth_type, …)	Read file from which you want to extract the methylation level and (assuming it is like the Ecker/Methylpy format) extract the number of methylated read and the total number of read for the cytosines covered and in the right genomic context (CG or CH) :param sample_name: name of the file to read to extract key information.
`ct.load_met_noimput`(matrix_file[, path, save])	read the raw count matrix and convert it into an AnnData object.

Reading open chromatin(ATAC) file¶

ATAC-seq specific functions to build count matrices and load data:

`ct.bld_mtx_fly`(tsv_file, annotation[, …])	Building count matrix on the fly.
`ct.save_sparse_mtx`(initial_matrix[, …])	Convert regular atac matrix into a sparse Anndata:

General functions¶

Functions non -omic specific:

ct.save_sparse_mtx(initial_matrix[, …])

Convert regular atac matrix into a sparse Anndata:

Preprocessing: PP¶

Imputing missing data (methylation), filtering lowly covered cells or variables, correction for batch effect.

`pp.coverage_cells`(adata[, key_added, log, …])	Histogram of the number of open features (in the case of ATAC-seq data) per cell.
`pp.commonness_features`(adata[, binary, log, …])	Display how often a feature is measured as open (for ATAC-seq).
`pp.correlation_pc`(adata, variable[, pc, …])	Correlation between a given PC and a covariate.
`pp.coverage_features`(adata[, binary, log, …])	Display how often a feature is measured as open (for ATAC-seq).
`pp.density_features`(adata[, threshold, …])	Display how often a feature is measured as open (for ATAC-seq).
`pp.select_var_feature`(adata[, min_score, …])	This function computes a variability score to rank the most variable features across all cells.
`pp.cal_var`(adata[, show, color, save])	Show distribution plots of cells sharing features and variability score.
`pp.variability_features`(adata[, min_score, …])	This function computes a variability score to rank the most variable features across all cells.
`pp.binarize`(adata[, copy])	convert the count matrix into a binary matrix.
`pp.lazy`(adata[, pp_pca, svd_solver, nb_pcs, …])	Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)
`pp.load_metadata`(adata, metadata_file[, …])	Load observational metadata in adata.obs.
`pp.read_ATAC_10x`(matrix[, cell_names, …])	Load sparse matrix (including matrices corresponding to 10x data) as AnnData objects.
`pp.filter_cells`(adata[, min_counts, …])	Filter cell outliers based on counts and numbers of genes expressed.
`pp.filter_features`(data[, min_counts, …])	Filter features based on number of cells or counts.
`pp.normalize_total`(adata[, target_sum, …])	Normalize counts per cell.
`pp.pca`(adata[, n_comps, zero_center, …])	Principal component analysis [Pedregosa11].
`pp.normalize_per_cell`(adata[, …])	Normalize total counts per cell.
`pp.regress_out`(adata, keys[, n_jobs, copy])	Regress out unwanted sources of variation.
`pp.subsample`(data[, fraction, n_obs, …])	Subsample to a fraction of the number of observations.
`pp.downsample_counts`(adata[, …])	Downsample counts from count matrix.
`pp.neighbors`(adata[, n_neighbors, n_pcs, …])	Compute a neighborhood graph of observations [McInnes18].
`pp.sparse`(adata[, sparse_format, copy])	Transform adata.X from a matrix or array to a sparse matrix.
`pp.log1p`()	Logarithmize the data matrix.
`pp.sparse`(adata[, sparse_format, copy])	Transform adata.X from a matrix or array to a sparse matrix.

Methylation matrices¶

Methylation specific count matrices.

`pp.imputation_met`(adata[, …])	Impute missing values in methyaltion level matrices.
`pp.load_met_noimput`(matrix_file[, path, save])	read the raw count matrix and convert it into an AnnData object.
`pp.readandimputematrix`(file_name[, min_coverage])	Temporary function to load and impute methyaltion count matrix into an AnnData object

Tools: TL¶

`tl.rank_features`(adata, groupby[, omic, …])	It is a wrap-up function of scanpy sc.tl.rank_genes_groups function.
`tl.lazy`(adata[, pp_pca, copy])	Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)
`tl.load_markers`(path, marker_list_file)	Convert list of known cell type markers from literature to a dictionary Input list of known marker genes First row is considered the header
`tl.identify_cluster`(adata, cell_type, …[, …])	Use markers of a given cell type to plot peak openness for peaks in promoters of the given markers Input cell type, cell type markers, peak promoter intersections
`tl.top_feature_genes`(adata, gtf_file[, …])	Deprecated - Please use epi.tl.var_features_to_genes instead.
`tl.var_features_to_genes`(adata, gtf_file[, …])	Once you called the most variable features.
`tl.geneactivity`(adata, gtf_file[, …])	Build an AnnData object containing the number of open features (windows, peaks, etc) overlapping genes (gene bodies + 5kb upstream of the TSS).
`tl.diffmap`(adata[, n_comps, copy])	Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].
`tl.draw_graph`(adata[, layout, init_pos, …])	Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].
`tl.tsne`(adata[, n_pcs, use_rep, perplexity, …])	t-SNE [Maaten08] [Amir13] [Pedregosa11].
`tl.umap`(adata[, min_dist, spread, …])	Embed the neighborhood graph using UMAP [McInnes18].
`tl.dpt`(adata[, n_dcs, n_branchings, …])	Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].
`tl.louvain`(adata[, resolution, …])	Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].
`tl.leiden`(adata[, resolution, restrict_to, …])	Cluster cells into subgroups [Traag18].
`tl.kmeans`(adata, num_clusters)	Compute kmeans clustering using X_pca fits.
`tl.hc`(adata, num_clusters)	Compute hierarchical clustering using X_pca fits.
`tl.getNClusters`(adata, n_cluster[, …])	Function will test different settings of louvain to obtain the target number of clusters.
`tl.dendogram`(adata, groupby[, n_pcs, …])	Computes a hierarchical clustering for the given groupby categories.
`tl.ARI`(adata, label_1, label_2)	Compute Adjusted Rand Index.
`tl.AMI`(adata, label_1, label_2)	Compute adjusted Mutual Info.
`tl.homogeneity`(adata, label_1, label_2)	Compute homogeneity score.
`tl.silhouette`(adata_name, cluster_annot[, …])	Compute silhouette scores.

Plotting: PL¶

The plotting module episcanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

`pl.pca`(adata, *[, color, gene_symbols, …])	Scatter plot in PCA coordinates.
`pl.pca_loadings`(adata[, components, …])	Rank features according to contributions to PCs.
`pl.pca_overview`(adata[, color, use_raw, …])	Plot PCA results.
`pl.pca_variance_ratio`(adata[, n_pcs, log, …])	Plot the variance ratio.
`pl.tsne`(adata, *[, color, gene_symbols, …])	Scatter plot in tSNE basis.
`pl.umap`(adata, *[, color, gene_symbols, …])	Scatter plot in UMAP basis.
`pl.diffmap`(adata, *[, color, gene_symbols, …])	Scatter plot in Diffusion Map basis.
`pl.draw_graph`(adata, *[, color, …])	Scatter plot in graph-drawing basis.
`pl.rank_feat_groups`(adata[, groups, …])	Plot ranking of features.
`pl.rank_feat_groups_violin`(adata[, groups, …])	Plot ranking of features for all tested comparisons.
`pl.rank_feat_groups_dotplot`(adata[, groups, …])	Plot ranking of features using dotplot plot (see `dotplot()`)
`pl.rank_feat_groups_stacked_violin`(adata[, …])	Plot ranking of features using stacked_violin plot (see `stacked_violin()`)
`pl.rank_feat_groups_matrixplot`(adata[, …])	Plot ranking of features using matrixplot plot (see `matrixplot()`)
`pl.rank_feat_groups_heatmap`(adata[, groups, …])	Plot ranking of features using heatmap plot (see `heatmap()`)
`pl.rank_feat_groups_tracksplot`(adata[, …])	Plot ranking of features using heatmap plot (see `heatmap()`)
`pl.cal_var`(adata[, show, color, save])	Show distribution plots of cells sharing features and variability score.
`pl.violin`(adata, keys[, groupby, log, …])	Violin plot.
`pl.scatter`(adata[, x, y, color, use_raw, …])	Scatter plot along observations or variables axes.
`pl.ranking`(adata, attr, keys[, dictionary, …])	Plot rankings.
`pl.clustermap`(adata[, obs_keys, use_raw, …])	Hierarchically-clustered heatmap.
`pl.stacked_violin`(adata, var_names, groupby)	Stacked violin plots.
`pl.heatmap`(adata, var_names, groupby[, …])	Heatmap of the expression values of genes.
`pl.dotplot`(adata, var_names, groupby[, …])	Makes a dot plot of the expression values of var_names.
`pl.matrixplot`(adata, var_names, groupby[, …])	Creates a heatmap of the mean expression values per group of each var_names.
`pl.tracksplot`(adata, var_names, groupby[, …])	In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.
`pl.dendrogram`(adata, groupby[, …])	Plots a dendrogram of the categories defined in groupby.
`pl.correlation_matrix`(adata, groupby[, …])	Plots the correlation matrix computed as part of sc.tl.dendrogram.
`pl.prct_overlap`(adata, key_1, key_2[, norm, …])	% or cell count corresponding to the overlap of different cell types between 2 set of annotations/clusters.
`pl.overlap_heatmap`(adata, key_1, key_2[, …])	Heatmap of the cluster correspondance between 2 set of annaotations.
`pl.cluster_composition`(adata, cluster, condition)
`pl.silhouette`(adata_name, cluster_annot[, …])	Plot the product of tl.silhouette as a silhouette plot
`pl.silhouette_tot`(adata_name, cluster_annot)	Both compute silhouette scores and plot it.
`pl.cal_var`(adata[, show, color, save])	Show distribution plots of cells sharing features and variability score.
`pl.variability_features`(adata[, min_score, …])	This function computes a variability score to rank the most variable features across all cells.