mgkit.counts.glm module¶

New in version 0.3.3.

GLM models with metagenomes and metatranscriptomes. Experimental

mgkit.counts.glm.fit_lowess_interpolate(endog, exog, frac=0.2, it=3, kind='slinear')[source]¶

Fits a lowess for the passed endog (Y) and exog (X) and returns an interpolated function that describes it. The first 4 arguments are passed to statsmodels.api.sm.nonparametric.lowess(), while the last one is passed to scipy.interpolate.interp1d()

Parameters:	endog (array) – array of the dependent variable (Y) exog (array) – array of the indipendent variable (X) frac (float) – fraction of the number of elements to use when fitting (0.0-1.0) it (int) – number of iterations to fit the lowess kind (str) – type of interpolation to use
Returns:	interpolated function representing the lowess fitted from the data passed
Return type:	func

mgkit.counts.glm.lowess_ci_bootstrap(endog, exog, num=100, frac=0.2, it=3, alpha=0.05, delta=0.0, min_value=0.001, kind='slinear')[source]¶

Bootstraps a lowess for the dependent (endog) and indipendent (exog) arguments.

Parameters:	endog (array) – indipendent variable (Y) exog (array) – indipendent variable (X) num (int) – number of iterations for the bootstrap frac (float) – fraction of the array to use when fitting it (int) – number of iterations used to fit the lowess alpha (float) – confidence intervals for the bootstrap delta (float) – passed to `statsmodels.api.nonparametric.lowess()` min_value (float) – minimum value for the function to avoid out of bounds kind (str) – type of interpolation passed to `scipy.interpolate.interp1d()`
Returns:	the first element is the function describing the lowest confidence interval, the second element is for the highest confidence interval and the last one for the mean
Return type:	tuple

Note

Performance increase with the value of delta.

mgkit.counts.glm.optimise_alpha_scipy(formula, data, mean_func, q1_func, q2_func)[source]¶

New in version 0.4.0.

Used to find an optimal alpha parameter for the Negative Binomial distribution used in statsmodels, using the lowess functions from lowess_ci_bootstrap().

Parameters:	formula (str) – the formula used for the regression data (DataFrame) – DataFrame for regression mean_func (func) – function for the mean `lowess_ci_bootstrap()` q1_func (func) – function for the q1 `lowess_ci_bootstrap()` q2_func (func) – function for the q2 `lowess_ci_bootstrap()`
Returns:	alpha value for the Negative Binomial
Return type:	float

mgkit.counts.glm.optimise_alpha_scipy_function(args, formula, data, criterion='aic')[source]¶: New in version 0.4.0.

mgkit.counts.glm.variance_to_alpha(mu, func, min_alpha=0.001)[source]¶

Based on the variance defined in the Negative Binomial in statsmodels

var = mu + alpha * (mu ** 2)

Parameters:	mu (float) – mean to calculate the alphas for func (func) – function that returns the variace of the mean min_alpha (float) – value of alpha if the func goes out of bounds
Returns:	value of alpha for the passed mean
Return type:	float