Machine Learning for Cybersecurity Cookbook

上QQ阅读APP看书，第一时间看更新

How to do it...

In this section, we'll walk through a recipe showing how to use PCA on data:

Start by importing the necessary libraries and reading in the dataset:

from sklearn.decomposition import PCA
import pandas as pd

data = pd.read_csv("file_pe_headers.csv", sep=",")
X = data.drop(["Name", "Malware"], axis=1).to_numpy()

Standardize the dataset, as is necessary before applying PCA:

from sklearn.preprocessing import StandardScaler

X_standardized = StandardScaler().fit_transform(X)

Instantiate a PCA instance and use it to reduce the dimensionality of our data:

pca = PCA()
pca.fit_transform(X_standardized)

Assess the effectiveness of your dimensionality reduction:

print(pca.explained_variance_ratio_)

The following screenshot shows the output: