当前位置: 首页 > news >正文

数据降维与可视化——t-SNE

t-SNE 是一种数据降维技术,通过把高维的数据降到 2~3维,方便使用可视化技术来展示数据。
论文下载

重要参数:

def tsne(X=np.array([]), no_dims=2, initial_dims=50, perplexity=30.0):
  • X ∈ R N × D X∈R^{N\times D} XRN×D ,N 个样本,每个样本由 D 维数据构成;
  • no_dims 压缩后的维度,默认为 2
  • init_dims:使用 PCA 对数据预处理,将原始样本集的维度降低至 initial_dims 维度,默认为 30
  • perplexity:高斯分布的困惑度,默认为 30;

对 MNIST 数据二维可视化

在这里插入图片描述

#
#  tsne.py
#
# Implementation of t-SNE in Python. The implementation was tested on Python
# 2.7.10, and it requires a working installation of NumPy. The implementation
# comes with an example on the MNIST dataset. In order to plot the
# results of this example, a working installation of matplotlib is required.
#
# The example can be run by executing: `ipython tsne.py`
#
#
#  Created by Laurens van der Maaten on 20-12-08.
#  Copyright (c) 2008 Tilburg University. All rights reserved.

import numpy as np
import pylab


def Hbeta(D=np.array([]), beta=1.0):
    """
        Compute the perplexity and the P-row for a specific value of the
        precision of a Gaussian distribution.
    """

    # Compute P-row and corresponding perplexity
    P = np.exp(-D.copy() * beta)
    sumP = sum(P)
    H = np.log(sumP) + beta * np.sum(D * P) / sumP
    P = P / sumP
    return H, P


def x2p(X=np.array([]), tol=1e-5, perplexity=30.0):
    """
        Performs a binary search to get P-values in such a way that each
        conditional Gaussian has the same perplexity.
    """

    # Initialize some variables
    print("Computing pairwise distances...")
    (n, d) = X.shape
    sum_X = np.sum(np.square(X), 1)
    D = np.add(np.add(-2 * np.dot(X, X.T), sum_X).T, sum_X)
    P = np.zeros((n, n))
    beta = np.ones((n, 1))
    logU = np.log(perplexity)

    # Loop over all datapoints
    for i in range(n):

        # Print progress
        if i % 500 == 0:
            print("Computing P-values for point %d of %d..." % (i, n))

        # Compute the Gaussian kernel and entropy for the current precision
        betamin = -np.inf
        betamax = np.inf
        Di = D[i, np.concatenate((np.r_[0:i], np.r_[i+1:n]))]
        (H, thisP) = Hbeta(Di, beta[i])

        # Evaluate whether the perplexity is within tolerance
        Hdiff = H - logU
        tries = 0
        while np.abs(Hdiff) > tol and tries < 50:

            # If not, increase or decrease precision
            if Hdiff > 0:
                betamin = beta[i].copy()
                if betamax == np.inf or betamax == -np.inf:
                    beta[i] = beta[i] * 2.
                else:
                    beta[i] = (beta[i] + betamax) / 2.
            else:
                betamax = beta[i].copy()
                if betamin == np.inf or betamin == -np.inf:
                    beta[i] = beta[i] / 2.
                else:
                    beta[i] = (beta[i] + betamin) / 2.

            # Recompute the values
            (H, thisP) = Hbeta(Di, beta[i])
            Hdiff = H - logU
            tries += 1

        # Set the final row of P
        P[i, np.concatenate((np.r_[0:i], np.r_[i+1:n]))] = thisP

    # Return final P-matrix
    print("Mean value of sigma: %f" % np.mean(np.sqrt(1 / beta)))
    return P


def pca(X=np.array([]), no_dims=50):
    """
        Runs PCA on the NxD array X in order to reduce its dimensionality to
        no_dims dimensions.
    """

    print("Preprocessing the data using PCA...")
    (n, d) = X.shape
    X = X - np.tile(np.mean(X, 0), (n, 1))
    (l, M) = np.linalg.eig(np.dot(X.T, X))
    Y = np.dot(X, M[:, 0:no_dims])
    return Y


def tsne(X=np.array([]), no_dims=2, initial_dims=50, perplexity=30.0):
    """
        Runs t-SNE on the dataset in the NxD array X to reduce its
        dimensionality to no_dims dimensions. The syntaxis of the function is
        `Y = tsne.tsne(X, no_dims, perplexity), where X is an NxD NumPy array.
    """

    # Check inputs
    if isinstance(no_dims, float):
        print("Error: array X should have type float.")
        return -1
    if round(no_dims) != no_dims:
        print("Error: number of dimensions should be an integer.")
        return -1

    # Initialize variables
    X = pca(X, initial_dims).real
    (n, d) = X.shape
    max_iter = 1000
    initial_momentum = 0.5
    final_momentum = 0.8
    eta = 500
    min_gain = 0.01
    Y = np.random.randn(n, no_dims)
    dY = np.zeros((n, no_dims))
    iY = np.zeros((n, no_dims))
    gains = np.ones((n, no_dims))

    # Compute P-values
    P = x2p(X, 1e-5, perplexity)
    P = P + np.transpose(P)
    P = P / np.sum(P)
    P = P * 4.									# early exaggeration
    P = np.maximum(P, 1e-12)

    # Run iterations
    for iter in range(max_iter):

        # Compute pairwise affinities
        sum_Y = np.sum(np.square(Y), 1)
        num = -2. * np.dot(Y, Y.T)
        num = 1. / (1. + np.add(np.add(num, sum_Y).T, sum_Y))
        num[range(n), range(n)] = 0.
        Q = num / np.sum(num)
        Q = np.maximum(Q, 1e-12)

        # Compute gradient
        PQ = P - Q
        for i in range(n):
            dY[i, :] = np.sum(np.tile(PQ[:, i] * num[:, i], (no_dims, 1)).T * (Y[i, :] - Y), 0)

        # Perform the update
        if iter < 20:
            momentum = initial_momentum
        else:
            momentum = final_momentum
        gains = (gains + 0.2) * ((dY > 0.) != (iY > 0.)) + \
                (gains * 0.8) * ((dY > 0.) == (iY > 0.))
        gains[gains < min_gain] = min_gain
        iY = momentum * iY - eta * (gains * dY)
        Y = Y + iY
        Y = Y - np.tile(np.mean(Y, 0), (n, 1))

        # Compute current value of cost function
        if (iter + 1) % 10 == 0:
            C = np.sum(P * np.log(P / Q))
            print("Iteration %d: error is %f" % (iter + 1, C))

        # Stop lying about P-values
        if iter == 100:
            P = P / 4.

    # Return solution
    return Y


if __name__ == "__main__":
    print("Run Y = tsne.tsne(X, no_dims, perplexity) to perform t-SNE on your dataset.")
    print("Running example on 2,500 MNIST digits...")
    X = np.loadtxt("mnist2500_X.txt")
    labels = np.loadtxt("mnist2500_labels.txt")
    Y = tsne(X, 2, 50, 20.0)
    pylab.scatter(Y[:, 0], Y[:, 1], 20, labels)
    pylab.show()

相关文章:

  • 单例模式完全剖析(1)---- 探究简单却又使人迷惑的单例模式
  • 使用 texttable可视化
  • 单例模式完全剖析(2)---- 探究简单却又使人迷惑的单例模式
  • pytorch 给数据增加一个维度
  • csv.writer().writerow() 产生空行
  • 单例模式完全剖析(3)---- 探究简单却又使人迷惑的单例模式
  • pytorch 猫狗大战
  • 点击添加MSN机器人小新,为您收听下载MSDN中文网络广播课程加油助力
  • pytorch 图像风格迁移
  • python 使用 glob 读取、删除同一类文件(*.txt,*.jpg)
  • Windows Embedded从入门到精通6月预告
  • python 混淆矩阵可视化
  • pytorch 机器翻译 seq2seq 模型和注意力机制
  • 在JavaScript中没有二维数组的概念
  • 互联网需要70老兵-祝贺杜红超再次创业
  • [nginx文档翻译系列] 控制nginx
  • 【108天】Java——《Head First Java》笔记(第1-4章)
  • 【挥舞JS】JS实现继承,封装一个extends方法
  • Date型的使用
  • IP路由与转发
  • Linux CTF 逆向入门
  • npx命令介绍
  • PHP 的 SAPI 是个什么东西
  • tweak 支持第三方库
  • vue数据传递--我有特殊的实现技巧
  • vue自定义指令实现v-tap插件
  • 从PHP迁移至Golang - 基础篇
  • 基于Mobx的多页面小程序的全局共享状态管理实践
  • 排序算法学习笔记
  • 盘点那些不知名却常用的 Git 操作
  • 如何抓住下一波零售风口?看RPA玩转零售自动化
  • 首页查询功能的一次实现过程
  • 因为阿里,他们成了“杭漂”
  • 看到一个关于网页设计的文章分享过来!大家看看!
  • ​LeetCode解法汇总307. 区域和检索 - 数组可修改
  • #Spring-boot高级
  • #我与Java虚拟机的故事#连载07:我放弃了对JVM的进一步学习
  • (WSI分类)WSI分类文献小综述 2024
  • (保姆级教程)Mysql中索引、触发器、存储过程、存储函数的概念、作用,以及如何使用索引、存储过程,代码操作演示
  • (二)Pytorch快速搭建神经网络模型实现气温预测回归(代码+详细注解)
  • (仿QQ聊天消息列表加载)wp7 listbox 列表项逐一加载的一种实现方式,以及加入渐显动画...
  • (附源码)springboot猪场管理系统 毕业设计 160901
  • (转) Face-Resources
  • (转)Android中使用ormlite实现持久化(一)--HelloOrmLite
  • (转)linux下的时间函数使用
  • **CI中自动类加载的用法总结
  • *Algs4-1.5.25随机网格的倍率测试-(未读懂题)
  • .bat批处理(五):遍历指定目录下资源文件并更新
  • .dat文件写入byte类型数组_用Python从Abaqus导出txt、dat数据
  • .NET Core 通过 Ef Core 操作 Mysql
  • .NET Core中的去虚
  • .NET delegate 委托 、 Event 事件
  • .net 怎么循环得到数组里的值_关于js数组
  • .NET/ASP.NETMVC 深入剖析 Model元数据、HtmlHelper、自定义模板、模板的装饰者模式(二)...
  • .NET版Word处理控件Aspose.words功能演示:在ASP.NET MVC中创建MS Word编辑器