Pandas: Convert list to dummies

TL, DR

Sometimes you may have a list in a Pandas DataFrame column and you are looking to get individual dummy columns for further elaborations. Here a very short guide about how to convert a list to individual dummies.

From list to dummies

In many cases during data exploration and wrangling you will find that you have a list of elements inside a Pandas DataFrame column. For instance, you have a column with the grapes blend used in certain wines like this:

pandas convert list to dummies - pandas dataframe with list in column
Pandas DataFrame with list in column

For further analysis you want to convert this list in a series of dummies. But simple list splitting may not work, as the grapes may be in different order and each blend may have different lengths.

Luckily in the Python ecosystem you find plenty of tools for data wrangling. In this case, one of the easiest and most straightforward tool is the MultiLabelBinarizer from scikit-learn.

Let’s see how can we make use of it. I will first provide you the full code, and then I will discuss the main blocks.

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

data_df = pd.DataFrame(
    {'grapes':
        [['Sangiovese','Merlot','Cabernet Franc'],
        ['Sangiovese'],
        ['Merlot','Cabernet Franc','Cabernet Sauvignon'],
        ['Merlot','Shiraz'],
        ['Sangiovese','Cerasuolo']]
    }, columns=['grapes'])

mlb = MultiLabelBinarizer()

binarized_df = pd.DataFrame(
    mlb.fit_transform(data_df["grapes"]),
    columns=mlb.classes_,
    index=data_df.index)

The result from the above code is the following. You can see that our Pandas DataFrame column list has been converted to dummies:

pandas convert list to dummies - pandas dataframe with dummies from list
Pandas DataFrame with dummies from list

Now let’s look at the various passages. Here we import the necessary packages:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

Here we create our Pandas DataFrame (you may have it from other sources):

data_df = pd.DataFrame(
    {'grapes':
        [['Sangiovese','Merlot','Cabernet Franc'],
        ['Sangiovese'],
        ['Merlot','Cabernet Franc','Cabernet Sauvignon'],
        ['Merlot','Shiraz'],
        ['Sangiovese','Cerasuolo']]
    }, columns=['grapes'])

This line create an instance of the MultiLabelBinarizer:

mlb = MultiLabelBinarizer()

Finally, we fit the MultiLabelBinarizer on our column with the list and create a new DataFrame:

binarized_df = pd.DataFrame(
    mlb.fit_transform(data_df["grapes"]),
    columns=mlb.classes_,
    index=data_df.index)

And that’s it! Hope you found this guide useful. BTW, I created the images on this pages using the dataframe-image package, you can find a complete guide here.

Related links

  • Visualize and export full Pandas DataFrames as images link
  • Scikit-Learn MultiLabelBinarizer link

Do you like our content? Check more of our posts in our blog!