統計學習方法第四章：樸素貝葉斯法(naive Bayes)，貝葉斯估計及python實現

統計學習方法第二章：感知機(perceptron)算法及python實現
 統計學習方法第三章：k近鄰法(k-NN),kd樹及python實現
 統計學習方法第四章：樸素貝葉斯法(naive Bayes)，貝葉斯估計及python實現
 統計學習方法第五章：決策樹(decision tree),CART算法，剪枝及python實現
 統計學習方法第五章：決策樹(decision tree),ID3算法，C4.5算法及python實現

完整代碼：
https://github.com/xjwhhh/LearningML/tree/master/StatisticalLearningMethod
歡迎follow和star

樸素貝葉斯(naive Bayes)法是基于貝葉斯定理與特征條件獨立假設的分類方法。

對于給定的訓練數據集，首先基于特征條件獨立假設學習輸入/輸出的聯合概率分布；然后基于此模型，對給定的輸入x，利用貝葉斯定理求出后驗概率最大的輸出y。

樸素貝葉斯法實現簡單，學習與預測的效率都很高，是一種常用的方法

下圖是樸素貝葉斯算法：

樸素貝葉斯算法

具體的解釋和證明可以看《統計學習方法》或其他博文，這里不再贅述

python代碼實現，使用MINST數據集，為了避免概率值為0的情況，使用貝葉斯估計：

import cv2
import time
import logging
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


def log(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        logging.debug('start %s()' % func.__name__)
        ret = func(*args, **kwargs)

        end_time = time.time()
        logging.debug('end %s(), cost %s seconds' % (func.__name__, end_time - start_time))

        return ret

    return wrapper


# 二值化,將圖片進行二值化的目的是確定每個特征可選的值只有兩種，對應于train方法里conditional_probability最后一個維度的長度2
def binaryzation(img):
    cv_img = img.astype(np.uint8)
    cv2.threshold(cv_img, 50, 1, cv2.THRESH_BINARY_INV, cv_img)
    return cv_img


@log
def train(train_set, train_labels):
    class_num = len(set(train_labels))
    feature_num = len(train_set[0])
    prior_probability = np.zeros(class_num)  # 先驗概率
    conditional_probability = np.zeros((class_num, feature_num, 2))  # 條件概率
    print(conditional_probability.shape)

    for i in range(len(train_labels)):
        img = binaryzation(train_set[i])  # 圖片二值化
        label = train_labels[i]

        prior_probability[label] += 1

        for j in range(feature_num):
            conditional_probability[label][j][img[j]] += 1

    # 貝葉斯估計，因為分母都相同，所以先驗概率和條件概率都不用除以分母
    prior_probability += 1
    for label in set(train_labels):
        for j in range(feature_num):
            conditional_probability[label][j][0] += 1
            conditional_probability[label][j][0] /= (len(train_labels[train_labels == label]) + 2 * 1)
            conditional_probability[label][j][1] += 1
            conditional_probability[label][j][1] /= (len(train_labels[train_labels == label]) + 2 * 1)

    # print(prior_probability)
    # print(conditional_probability)
    return prior_probability, conditional_probability


@log
def predict(test_features, prior_probability, conditional_probability):
    result = []
    for test in test_features:
        img = binaryzation(test)

        max_label = 0
        max_probability = 0

        for i in range(len(prior_probability)):

            # print("label",i)
            probability = prior_probability[i]
            for j in range(len(img)):  # 特征長度
                # print("j",j)
                probability *= int(conditional_probability[i][j][img[j]])
            if max_probability < probability:
                max_probability = probability
                max_label = i
        result.append(max_label)
    return np.array(result)


if __name__ == '__main__':
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)

    raw_data = pd.read_csv('../data/train.csv', header=0)
    data = raw_data.values

    imgs = data[0:2000, 1:]
    labels = data[0:2000, 0]

    # print(imgs.shape)

    # 選取 2/3 數據作為訓練集， 1/3 數據作為測試集
    train_features, test_features, train_labels, test_labels = train_test_split(imgs, labels, test_size=0.33,random_state=1)

    prior_probability, conditional_probability = train(train_features, train_labels)
    test_predict = predict(test_features, prior_probability, conditional_probability)
    score = accuracy_score(test_labels, test_predict)
    print("The accuracy score is ", score)

本文鏈接：https://blog.csdn.net/devil_bye/article/details/80723510

智能推薦

樸素貝葉斯 (Naive Bayes)

樸素貝葉斯 (Naive Bayes) 前言貝葉斯定理概率論中必學的一個定理，而樸素貝葉斯就是基于此的一種簡單分類方法。樸素貝葉斯（naive Bayes）法是是基于貝葉斯定理和特征條件獨立假設的分類方法數學解釋條件獨立公式，如果X和Y相互獨立，則有： P(X,Y)=P(X)P(Y)P(X,Y)=P(X)P(Y)P(X,Y)=P(X)P(Y) 條件概率公式： P(Y∣X)=P(X,...

樸素貝葉斯(Naive Bayes)

學習目標：掌握貝葉斯公式結合兩個實例了解貝樸素葉斯的參數估計掌握貝葉斯估計學習內容： 1.2 樸素貝葉斯的介紹樸素貝葉斯算法（Naive Bayes, NB) 是應用最為廣泛的分類算法之一。它是基于貝葉斯定義和特征條件獨立假設的分類器方法。由于樸素貝葉斯法基于貝葉斯公式計算得到，有著堅實的數學基礎，以及穩定的分類效率。NB模型所需估計的參數很少，對缺失數據不太敏感，算法也比較簡單。當年的...

樸素貝葉斯法是基于貝葉斯定理與特征條件獨立假設的分類方法。對于給定的訓練數據集，首先基于特征條件獨立假設學習輸入/輸出的聯合概率密度分布；然后基于此模型，對于給定的輸入x,利用貝葉斯定理求出后驗概率最大的輸出y。樸素貝葉斯法實現簡單，學習與預測的效率都很高，是一種常用的方法。一、樸素貝葉斯 1.1 基本方法樸素貝葉斯法對條件概率分布作了條件獨立性的假設。由于這是一個較強的假設，樸素貝葉斯由此得...

高斯樸素貝葉斯（Gaission Naive Bayes）之 Python

這里寫自定義目錄標題前言樸素貝葉斯(naive bayes ) 貝葉斯公式及原理條件獨立假設高斯樸素貝葉斯分類器知識補充全概率公式以及貝葉斯公式總體方差和樣本方差自己的見解合理的創建標題，有助于目錄的生成如何改變文本的樣式插入鏈接與圖片如何插入一段漂亮的代碼片生成一個適合你的列表創建一個表格設定內容居中、居左、居右 SmartyPants 創建一個自定義列表如何創建...

HTML中常用操作關于：頁面跳轉，空格

1.頁面跳轉 2.空格的代替符...

統計學習方法第四章：樸素貝葉斯法(naive Bayes)，貝葉斯估計及python實現

智能推薦

猜你喜歡