numpy - Matrix completion in Python -


say have matrix:

> import numpy nap > = np.random.random((5,5))  array([[ 0.28164485,  0.76200749,  0.59324211,  0.15201506,  0.74084168],        [ 0.83572213,  0.63735993,  0.28039542,  0.19191284,  0.48419414],        [ 0.99967476,  0.8029097 ,  0.53140614,  0.24026153,  0.94805153],        [ 0.92478   ,  0.43488547,  0.76320656,  0.39969956,  0.46490674],        [ 0.83315135,  0.94781119,  0.80455425,  0.46291229,  0.70498372]]) 

and punch holes in np.nan, e.g.:

> a[(1,4,0,3),(2,4,2,0)] = np.nan;   array([[ 0.80327707,  0.87722234,         nan,  0.94463778,  0.78089194],        [ 0.90584284,  0.18348667,         nan,  0.82401826,  0.42947815],        [ 0.05913957,  0.15512961,  0.08328608,  0.97636309,  0.84573433],        [        nan,  0.30120861,  0.46829231,  0.52358888,  0.89510461],        [ 0.19877877,  0.99423591,  0.17236892,  0.88059185,        nan ]]) 

i fill-in nan entries using information rest of entries of matrix. example using average value of column nan entries occur.

more generally, there libraries in python matrix completion ? (e.g. along lines of candes & recht's convex optimization method).

background:

this problem appears in machine learning. example when working missing features in classification/regression or in collaborative filtering (e.g. see netflix problem on wikipedia , here)

if install latest scikit-learn, version 0.14a1, can use shiny new imputer class:

>>> sklearn.preprocessing import imputer >>> imp = imputer(strategy="mean") >>> = np.random.random((5,5)) >>> a[(1,4,0,3),(2,4,2,0)] = np.nan >>> array([[ 0.77473361,  0.62987193,         nan,  0.11367791,  0.17633671],        [ 0.68555944,  0.54680378,         nan,  0.64186838,  0.15563309],        [ 0.37784422,  0.59678177,  0.08103329,  0.60760487,  0.65288022],        [        nan,  0.54097945,  0.30680838,  0.82303869,  0.22784574],        [ 0.21223024,  0.06426663,  0.34254093,  0.22115931,         nan]]) >>> = imp.fit_transform(a) >>> array([[ 0.77473361,  0.62987193,  0.24346087,  0.11367791,  0.17633671],        [ 0.68555944,  0.54680378,  0.24346087,  0.64186838,  0.15563309],        [ 0.37784422,  0.59678177,  0.08103329,  0.60760487,  0.65288022],        [ 0.51259188,  0.54097945,  0.30680838,  0.82303869,  0.22784574],        [ 0.21223024,  0.06426663,  0.34254093,  0.22115931,  0.30317394]]) 

after this, can use imp.transform same transformation other data, using mean imp learned a. imputers tie scikit-learn pipeline objects can use them in classification or regression pipelines.

if want wait stable release, 0.14 should out next week.

full disclosure: i'm scikit-learn core developer


Comments

Popular posts from this blog

basic authentication with http post params android -

vb.net - Virtual Keyboard commands -

How to get multiresult with multicondition in Sql Server -