API Reference¶

Estimators¶

Models following scikit-learn’s estimator API.

class dask_glm.estimators.LinearRegression(fit_intercept=True, solver='admm', regularizer='l2', max_iter=100, tol=0.0001, lamduh=1.0, rho=1, over_relax=1, abstol=0.0001, reltol=0.01)[source]¶

Esimator for a linear model using Ordinary Least Squares.

Parameters:

fit_intercept : bool, default True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

solver : {‘admm’, ‘gradient_descent’, ‘newton’, ‘lbfgs’, ‘proximal_grad’}

Solver to use. See Algorithms for details

regularizer : {‘l1’, ‘l2’}

Regularizer to use. See Regularizers for details. Only used with admm and proximal_grad solvers.

max_iter : int, default 100

Maximum number of iterations taken for the solvers to converge

tol : float, default 1e-4

Tolerance for stopping criteria. Ignored for admm solver

lambduh : float, default 1.0

Only used with admm and proximal_grad solvers

rho, over_relax, abstol, reltol : float

Only used with the admm solver.

Examples

>>> from dask_glm.datasets import make_regression
>>> X, y = make_regression()
>>> est = LinearRegression()
>>> est.fit(X, y)
>>> est.predict(X)
>>> est.score(X, y)

Attributes

coef_	(array, shape (n_classes, n_features)) The learned value for the model’s coefficients
intercept_	(float of None) The learned value for the intercept, if one was added to the model

class dask_glm.estimators.LogisticRegression(fit_intercept=True, solver='admm', regularizer='l2', max_iter=100, tol=0.0001, lamduh=1.0, rho=1, over_relax=1, abstol=0.0001, reltol=0.01)[source]¶

Esimator for logistic regression.

Parameters:

fit_intercept : bool, default True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

solver : {‘admm’, ‘gradient_descent’, ‘newton’, ‘lbfgs’, ‘proximal_grad’}

Solver to use. See Algorithms for details

regularizer : {‘l1’, ‘l2’}

Regularizer to use. See Regularizers for details. Only used with admm, lbfgs, and proximal_grad solvers.

max_iter : int, default 100

Maximum number of iterations taken for the solvers to converge

tol : float, default 1e-4

Tolerance for stopping criteria. Ignored for admm solver

lambduh : float, default 1.0

Only used with admm, lbfgs and proximal_grad solvers.

rho, over_relax, abstol, reltol : float

Only used with the admm solver.

Examples

>>> from dask_glm.datasets import make_classification
>>> X, y = make_classification()
>>> lr = LogisticRegression()
>>> lr.fit(X, y)
>>> lr.predict(X)
>>> lr.predict_proba(X)
>>> est.score(X, y)

Attributes

coef_	(array, shape (n_classes, n_features)) The learned value for the model’s coefficients
intercept_	(float of None) The learned value for the intercept, if one was added to the model

class dask_glm.estimators.PoissonRegression(fit_intercept=True, solver='admm', regularizer='l2', max_iter=100, tol=0.0001, lamduh=1.0, rho=1, over_relax=1, abstol=0.0001, reltol=0.01)[source]¶

Esimator for Poisson Regression.

Parameters:

fit_intercept : bool, default True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

solver : {‘admm’, ‘gradient_descent’, ‘newton’, ‘lbfgs’, ‘proximal_grad’}

Solver to use. See Algorithms for details

regularizer : {‘l1’, ‘l2’}

Regularizer to use. See Regularizers for details. Only used with admm, lbfgs, and proximal_grad solvers.

max_iter : int, default 100

Maximum number of iterations taken for the solvers to converge

tol : float, default 1e-4

Tolerance for stopping criteria. Ignored for admm solver

lambduh : float, default 1.0

Only used with admm, lbfgs and proximal_grad solvers.

rho, over_relax, abstol, reltol : float

Only used with the admm solver.

Examples

>>> from dask_glm.datasets import make_poisson
>>> X, y = make_poisson()
>>> pr = PoissonRegression()
>>> pr.fit(X, y)
>>> pr.predict(X)
>>> pr.get_deviance(X, y)

Attributes

coef_	(array, shape (n_classes, n_features)) The learned value for the model’s coefficients
intercept_	(float of None) The learned value for the intercept, if one was added to the model

Families¶

class dask_glm.families.Logistic[source]¶

Implements methods for Logistic regression, useful for classifying binary outcomes.

static gradient(Xbeta, X, y)[source]¶: Logistic gradient

static hessian(Xbeta, X)[source]¶: Logistic hessian

static loglike(Xbeta, y)[source]¶

Evaluate the logistic loglikeliehood

Parameters:

Xbeta : array, shape (n_samples, n_features)

y : array, shape (n_samples)

static pointwise_gradient(beta, X, y)[source]¶: Logistic gradient, evaluated point-wise.

static pointwise_loss(beta, X, y)[source]¶: Logistic Loss, evaluated point-wise.

class dask_glm.families.Normal[source]¶: Implements methods for Linear regression, useful for modeling continuous outcomes.

class dask_glm.families.Poisson[source]¶: This implements Poisson regression, useful for modelling count data.

Algorithms¶

Optimization algorithms for solving minimizaiton problems.

dask_glm.algorithms.admm(X, y, regularizer='l1', lamduh=0.1, rho=1, over_relax=1, max_iter=250, abstol=0.0001, reltol=0.01, family=<class 'dask_glm.families.Logistic'>, **kwargs)[source]¶

Alternating Direction Method of Multipliers

Parameters:

X : array-like, shape (n_samples, n_features)

y : array-like, shape (n_samples,)

regularizer : str or Regularizer

lambuh : float

rho : float

over_relax : FLOAT

max_iter : int

maximum number of iterations to attempt before declaring failure to converge

abstol, reltol : float

family : Family

Returns:

beta : array-like, shape (n_features,)

dask_glm.algorithms.compute_stepsize_dask(beta, step, Xbeta, Xstep, y, curr_val, family=<class 'dask_glm.families.Logistic'>, stepSize=1.0, armijoMult=0.1, backtrackMult=0.1)[source]¶

Compute the optimal stepsize

beta : array-like step : float XBeta : array-lie Xstep : y : array-like curr_val : float famlily : Family, optional stepSize : float, optional armijoMult : float, optional backtrackMult : float, optional

Returns:

stepSize : flaot

beta : array-like

xBeta : array-like

func : callable

dask_glm.algorithms.gradient_descent(X, y, max_iter=100, tol=1e-14, family=<class 'dask_glm.families.Logistic'>, **kwargs)[source]¶

Michael Grant’s implementation of Gradient Descent.

Parameters:

X : array-like, shape (n_samples, n_features)

y : array-like, shape (n_samples,)

max_iter : int

maximum number of iterations to attempt before declaring failure to converge

tol : float

Maximum allowed change from prior iteration required to declare convergence

family : Family

Returns:

beta : array-like, shape (n_features,)

dask_glm.algorithms.lbfgs(X, y, regularizer=None, lamduh=1.0, max_iter=100, tol=0.0001, family=<class 'dask_glm.families.Logistic'>, verbose=False, **kwargs)[source]¶

L-BFGS solver using scipy.optimize implementation

Parameters:

X : array-like, shape (n_samples, n_features)

y : array-like, shape (n_samples,)

max_iter : int

maximum number of iterations to attempt before declaring failure to converge

tol : float

Maximum allowed change from prior iteration required to declare convergence

family : Family

Returns:

beta : array-like, shape (n_features,)

dask_glm.algorithms.newton(X, y, max_iter=50, tol=1e-08, family=<class 'dask_glm.families.Logistic'>, **kwargs)[source]¶

Newtons Method for Logistic Regression.

Parameters:

X : array-like, shape (n_samples, n_features)

y : array-like, shape (n_samples,)

max_iter : int

maximum number of iterations to attempt before declaring failure to converge

tol : float

Maximum allowed change from prior iteration required to declare convergence

family : Family

Returns:

beta : array-like, shape (n_features,)

dask_glm.algorithms.proximal_grad(X, y, regularizer='l1', lamduh=0.1, family=<class 'dask_glm.families.Logistic'>, max_iter=100, tol=1e-08, **kwargs)[source]¶

Parameters:

X : array-like, shape (n_samples, n_features)

y : array-like, shape (n_samples,)

max_iter : int

maximum number of iterations to attempt before declaring failure to converge

tol : float

Maximum allowed change from prior iteration required to declare convergence

family : Family

verbose : bool, default False

whether to print diagnostic information during convergence

Returns:

beta : array-like, shape (n_features,)

Regularizers¶

Available `Regularizers`¶

These regularizers are included with dask-glm.

class dask_glm.regularizers.ElasticNet(weight=0.5)[source]¶

Elastic net regularization.

proximal_operator(beta, t)[source]¶: See notebooks/ElasticNetProximalOperatorDerivation.ipynb for derivation.

class dask_glm.regularizers.L1[source]¶: L1 regularization.

class dask_glm.regularizers.L2[source]¶: L2 regularization.

`Regularizer` Interface¶

Users wishing to implement their own regularizer should satisfy this interface.

class dask_glm.regularizers.Regularizer[source]¶

Abstract base class for regularization object.

Defines the set of methods required to create a new regularization object. This includes the regularization functions itself and its gradient, hessian, and proximal operator.

add_reg_f(f, lam)[source]¶

Add regularization function to other function.

Parameters:

f : callable

Function taking beta and *args

lam : float

regularization constant

Returns:

wrapped : callable

function taking beta and *args

add_reg_grad(grad, lam)[source]¶

Add regularization gradient to other gradient function.

Parameters:

grad : callable

Function taking beta and *args

lam : float

regularization constant

Returns:

wrapped : callable

function taking beta and *args

add_reg_hessian(hess, lam)[source]¶

Add regularization hessian to other hessian function.

Parameters:

hess : callable

Function taking beta and *args

lam : float

regularization constant

Returns:

wrapped : callable

function taking beta and *args

f(beta)[source]¶

Regularization function.

Parameters:	beta : array, shape (n_features,)
Returns:	result : float

classmethod get(obj)[source]¶

Get the concrete instance for the name obj.

Parameters:

obj : Regularizer or str

Valid instances of Regularizer are passed through. Strings are looked up according to obj.name and a new instance is created

Returns:

obj : Regularizer

gradient(beta)[source]¶

Gradient of regularization function.

Parameters:	beta : array, shape `(n_features,)`
Returns:	gradient : array, shape `(n_features,)`

hessian(beta)[source]¶

Hessian of regularization function.

Parameters:	beta : array, shape `(n_features,)`
Returns:	hessian : array, shape `(n_features, n_features)`

proximal_operator(beta, t)[source]¶

Proximal operator for regularization function.

Parameters:

beta : array, shape (n_features,)

t : float # TODO: is that right?

Returns:

proximal_operator : array, shape (n_features,)

API Reference¶

Estimators¶

Families¶

Algorithms¶

Regularizers¶

Available Regularizers¶

Regularizer Interface¶

Available `Regularizers`¶

`Regularizer` Interface¶