.. _examples-miml-classification-knn:

*******************
K-Nearest Neighbor
*******************

The k-nearest neighbor algorithm (k-NN) is a method for classifying objects by a majority vote of its 
neighbors, with the object being assigned to the class most common amongst its k nearest neighbors 
(k is typically small). k-NN is a type of instance-based learning, or lazy learning where the function 
is only approximated locally and all computation is deferred until classification.

Nearest neighbor rules in effect compute the decision boundary in an implicit manner. In the following 
example, we show the implicit decision boundary of k-NN on a 2-dimensional toy data. Please try 
different k, and you will observe the change of decision boundary. In general, the larger k, the 
smoother the boundary.

::

    from miml import datasets
    from miml.classification import KNearestNeighbor

    fn = os.path.join(datasets.get_data_home(), 'classification', 'toy', 
        'toy-test.txt')
    df = DataFrame.read_table(fn, header=None, names=['x1','x2'], 
        format='%2f', index_col=0)

    X = df.values
    y = array(df.index.data)

    n_neighbors = 3
    knn = KNearestNeighbor(n_neighbors)
    knn.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    n = 50  # size in the mesh
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, n),
                         np.linspace(y_min, y_max, n))
    data = np.vstack((xx.flatten(), yy.flatten())).T
    Z = knn.predict(data)

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)

    #Plot
    # Create color maps
    cmap_light = ['#FFAAAA', '#AAAAFF']
    cmap_bold = ['#FF0000', '#0000FF']
    imshow(xx[0,:], yy[:,0], Z, colors=cmap_light)
    # Plot also the training points
    ls = plt.scatter(X[:, 0], X[:, 1], c=y,
                edgecolor=None, s=3, levels=[0,1], colors=cmap_bold)
    plt.contour(xx[0,:], yy[:,0], Z, [0.5], color='k', smooth=False)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("KNN example (k = %i)"
              % (n_neighbors))
    
.. image:: ../../../_static/miml/knn_1.png

3-class classification::

    from miml import datasets
    from miml.classification import KNearestNeighbor

    iris = datasets.load_iris()

    # we only take the first two features. We could avoid this ugly
    # slicing by using a two-dim dataset
    X = iris.data[:, :2]
    y = iris.target

    n_neighbors = 15
    model = KNearestNeighbor(n_neighbors)
    model.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    h = .02  # step size in the mesh
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    data = np.vstack((xx.flatten(), yy.flatten())).T
    Z = model.predict(data)

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)

    #Plot
    # Create color maps
    cmap_light = ['#FFAAAA', '#AAFFAA', '#AAAAFF']
    cmap_bold = ['#FF0000', '#00FF00', '#0000FF']
    pcolor(xx, yy, Z, colors=cmap_light)
    # Plot also the training points
    ls = plt.scatter(X[:, 0], X[:, 1], c=y,
                edgecolor='k', s=4, levels=[0,1,2], colors=cmap_bold)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i)"
              % (n_neighbors))
              
.. image:: ../../../_static/miml/knn_2.png