Categorical variables are a type of data that can be sorted into groups or categories with shared characteristics. They are also referred to as qualitative or discrete variables and are typically represented by non-numerical descriptors.
Examples of categorical variables include name of people or cities, color of eyes or hair, or favorite brand of car, where each category just represents a qualitative kind of data rather than numeric.
How they work
Categorical variables function by categorizing distinct data into groups. These variables can be further classified into two types: nominal and ordinal. Nominal categorical variables are those that don't have a logical order and cannot be sorted in a particular sequence. For instance, when considering "eye color" as a categorical variable, "blue", "green" or "brown" are just different categories without any inherent hierarchy or numerical value.
On the other hand, ordinal categorical variables have a logical or meaningful order. An example is the variable "education level", where categories such as "high school", "bachelor's degree", "master's degree", and "PhD" reflect a certain order and can be ranked.
In data analysis, categorical variables are often encoded into numerical form to enable mathematical operations and facilitate machine learning algorithms. This can involve strategies like one-hot encoding, where each category is mapped to a binary vector, or ordinal encoding for ordinal variables, where categories are assigned integer values according to their relative ordering.
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Several people are typing about AI/ML security. Come join us and 1000+ others in a chat that’s thoroughly SFW.