Skip to contents


f1_score <- 2 * (precision * recall) / (precision + recall)

This is the harmonic mean of precision and recall. Its output range is [0, 1]. It works for both multi-class and multi-label classification.


  average = NULL,
  threshold = NULL,
  name = "f1_score",
  dtype = NULL



For forward/backward compatability.


Type of averaging to be performed on data. Acceptable values are NULL, "micro", "macro" and "weighted". Defaults to NULL. If NULL, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.


Elements of y_pred greater than threshold are converted to be 1, and the rest 0. If threshold is NULL, the argmax of y_pred is converted to 1, and the rest to 0.


Optional. String name of the metric instance.


Optional. Data type of the metric result.


a Metric instance is returned. The Metric instance can be passed directly to compile(metrics = ), or used as a standalone object. See ?Metric for example usage.


metric <- metric_f1_score(threshold = 0.5)
y_true <- rbind(c(1, 1, 1),
                c(1, 0, 0),
                c(1, 1, 0))
y_pred <- rbind(c(0.2, 0.6, 0.7),
                c(0.2, 0.6, 0.6),
                c(0.6, 0.8, 0.0))
metric$update_state(y_true, y_pred)
result <- metric$result()

## tf.Tensor([0.49999997 0.79999995 0.66666657], shape=(3), dtype=float32)


F-1 Score: float.