Skip to main content

categoricalInformationValue

Introduced in: v20.1.0 Calculates the information value (IV) for categorical features in relation to a binary target variable. For each category, the function computes: (P(tag = 1) - P(tag = 0)) × (log(P(tag = 1)) - log(P(tag = 0))) where:
  • P(tag = 1) is the probability that the target equals 1 for the given category
  • P(tag = 0) is the probability that the target equals 0 for the given category
Information Value is a statistic used to measure the strength of a categorical feature’s relationship with a binary target variable in predictive modeling. Higher absolute values indicate stronger predictive power. The result indicates how much each discrete (categorical) feature [category1, category2, ...] contributes to a learning model which predicts the value of tag. Syntax
categoricalInformationValue(category1[, category2, ...,]tag)
Arguments
  • category1, category2, ... — One or more categorical features to analyze. Each category should contain discrete values. UInt8
  • tag — Binary target variable for prediction. Should contain values 0 and 1. UInt8
Returned value Returns an array of Float64 values representing the information value for each unique combination of categories. Each value indicates the predictive strength of that category combination for the target variable. Array(Float64) Examples Basic usage analyzing age groups vs mobile usage
Query
-- Using the metrica.hits dataset (available on https://sql.clickhouse.com/) to analyze age-mobile relationship
SELECT categoricalInformationValue(Age < 15, IsMobile)
FROM metrica.hits;
Response
[0.0014814694805292418]
Multiple categorical features with user demographics
Query
SELECT categoricalInformationValue(
    Sex,                 -- 0=male, 1=female
    toUInt8(Age < 25),   -- 0=25+, 1=under 25
    toUInt8(IsMobile)    -- 0=desktop, 1=mobile
) AS iv_values
FROM metrica.hits
WHERE Sex IN (0, 1);
Response
[0.00018965785460692887,0.004973668839403392]
Last modified on June 8, 2026