Machine Learning for Cybersecurity Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

In this section, we'll walk through a recipe showing how to use PCA on data:

  1. Start by importing the necessary libraries and reading in the dataset:
from sklearn.decomposition import PCA
import pandas as pd

data = pd.read_csv("file_pe_headers.csv", sep=",")
X = data.drop(["Name", "Malware"], axis=1).to_numpy()
  1. Standardize the dataset, as is necessary before applying PCA:
from sklearn.preprocessing import StandardScaler

X_standardized = StandardScaler().fit_transform(X)
  1. Instantiate a PCA instance and use it to reduce the dimensionality of our data:
pca = PCA()
pca.fit_transform(X_standardized)
  1. Assess the effectiveness of your dimensionality reduction:
print(pca.explained_variance_ratio_)

The following screenshot shows the output: