Categorical, and Numerical Features in Deep Neural Network

Categorical, and Numerical Features in a Deep Neural Network

Note: The author has created all the images and tables with his own hand. If you use the images and the table, please cite the blog to support the hard work. 🙂 Thanks.

In this post we assume that you know the basic structure of a deep neural network, and how the inputs are transformed. Now, you are now curious about how the input happens for each data point.

We will understand it with an example of a data set with quite a few segregations to help you understand how a neural network handles each of the cases. We assume that the response variable is numerical for easy demonstration purposes. We will write a different article for the response variable, which is essentially a classification or a regression problem. Also, the question remains, what happens for an ordinal response variable. All these exciting questions will be answered in our blog, and YouTube soon.

A Neural Network

This is how a neural network looks like. The question is what exactly goes inside that input layer for a given data frame. Let's start with an easy single numerical feature input. In this example, there are biases at every node, which are not drawn in the picture to avoid clutter, but bias is an important parameter (weight).

Single Numerical Feature

In this data set we have one predictor variable $X$ and one response variable $Y$. One important thing to remember is that the neural network diagram you see is for a single data point that is (1.2, 2.3).

The output of the neural network is the loss of a single data point. We use that to create a loss function by adding up all the loss values. We then minimize this final loss function with respect to the parameters (weights, biases).

Simple Linear Regression

For simple linear regression you can see how the whole process looks like for a single data point, now these micro losses are added to make the macro loss function.

Multiple Numerical Features

In this example, we take three predictor variables, which are all numerical in nature. This is an easy extension of the single input numerical features.

Let's see how the neural network looks like in this case. We have taken the first data point as an example to show in the neural network.

Single Categorical Feature

Things will become spicy now, as we introduce a categorical variable in the picture. One method that is used for this is called one hot encoding. This is quite natural in the for a categorical dataset, given the huge literature in linear models in statistics.

In this case, the categorical predictor is transformed into a data frame like the following.

This is now used as a new data frame, which is taken as an input into the neural network. This is quite an interesting way to deal with the categorical variable, as this is exactly how ANOVA, ANCOVA, and other linear models in statistics literature are used for supervised learning. Let me explain a bit more how the one hot encoding is done.

One Hot Encoding

If a certain categorical predictor $X$ has $n$ factors $\{x_1, x_2, x_3, \cdots, x_n\}$, then every data point with feature factor data $x_k$ is replaced by $(0, 0, 0, \cdots, 1, \cdots, 0, 0)$, where the $1$ is at the $k^{th}$ position of the vector.

$x_1 \rightarrow (1, 0, 0, \cdots, 0, 0)$
$x_2 \rightarrow (0, 1, 0, \cdots, 0, 0)$
$x_3 \rightarrow (0, 0, 1, \cdots, 0, 0)$
and so on...

Thus, you can see that the following is the expansion of the given data set in terms of the encoding.

Let me show you the neural network version of the input of the data set now.

Now, if you ask the question, why this helps, you will understand with the example of the functional form of the ANOVA model.

This may not seem like the ANOVA functional form, but this is exactly the ANOVA functional form where for each factor there is a real weight associated, which impacts the output response variable. The weights are related in a linear fashion exactly like the ANOVA.

Multiple Categorical Feature

Let's see what the one hot encoding does to the data set above.

The data is taken as input in a deep neural network in the exact same fashion as the previous examples are shown above.

Interesting! Now, let's look into a mixed data set.

Numerical and Categorical Features

For this data set above, let's also one hot code the categorical predictor $X_3$ and see how it looks like.

This gives you the complete representation of the data points as an input into a neural network. This involves any kind of data involving categorical, and numerical features.

Essentially the categorical variables are one hot encoded and then they are concatenated with the unchanged numerical variables to create a new data frame, as shown above.

This data preprocessing is explicitly done before fitting a neural network to a data set. I am showing the following using a demo python code block. This is an important data preprocessing portion for neural network.

We first define the steps in a list where each step is a tuple containing a name and an instance of the transformer, StandardScaler in this case. StandardScaler standardizes features by removing the mean and scaling to unit variance.

num_pipeline = Pipeline(steps=[
    ('std_scaler', StandardScaler())
])

The categorical pipeline: Similar to the numerical pipeline, we define the steps in a list where each step is a tuple containing a name and an instance of the transformer, OneHotEncoder in this case. OneHotEncoder converts categorical variable(s) into dummy/indicator variables.

num_pipeline = Pipeline(steps=[
    ('std_scaler', StandardScaler())
])

ColumnTransformer allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

preprocessor = ColumnTransformer(
    transformers=[
        # Tuples specifying the transformer objects to be applied to subsets of the data.
        # 'num' is a name for this step, num_pipeline is the transformer object, and num_features is the list of column(s) to be processed.
        ('num', num_pipeline, num_features),
        # 'cat' is a name for this step, cat_pipeline is the transformer object, and cat_features is the list of column(s) to be processed.
        ('cat', cat_pipeline, cat_features)
    ])

Ordinal Features

For ordinal features like Temperature with factors as {Cold, Warm, Hot}, you can replace them by {1, 2, 3} respectively to induce the ordinal features in the data set.

We will explain the StandardScaler and its use in a future article. This article is devoted to tackle and deal with different types of data input.

Learn. Code. Apply.

Statistics. Machine Learning.