Converting categorical variables
As you already have noticed, a data frame can contain columns with the data of different types. To see which type has each column, we can check the dtypes attribute of the data frame. You can think about Python attributes as being similar to Swift properties:
In []: df.dtypes Out[]: length float64 color object fluffy bool label object dtype: object
While length and fluffy columns contain the expected datatypes, the types of color and label are less transparent. What are those objects? This means those columns can contain any type of the object. At the moment, we have strings in them, but what we really want them to be are categorical variables. In case you don't remember from the previous chapter, categorical variables are like Swift enums. Fortunately for us, data frame has handy methods for converting columns from one type to another:
In []: df.color = df.color.astype('category') df.label = df.label.astype('category')
That's it. Let's check:
In []: df.dtypes Out []: length float64 color category fluffy bool label category dtype: object
color and label are categories now. To see all colors in those categories, execute:
In []: colors = df.color.cat.categories.get_values().astype('string') colors Out[]: array(['light black', 'pink gold', 'purple polka-dot', 'space gray'], dtype='|S16')
As expected, we have four colors. '|S16' stands for strings of 16 characters in length.