Good question and thanks for reading. It would be something more like a clustering algorithm (for example, one metric that might be used for the splitting metric is the Gini Coefficient) — the decision tree doesn’t care about how the split between the two groups (50%/50% vs. 90%/10%). It only cares about how similar (in terms of features) the members within each cluster are to each other and how different the average member of one cluster is to the other cluster.
So the algorithm doesn’t consider the resulting split, it purely tries to maximize within group similarity and minimize across group similarity. Hope this addresses your question. Thanks again!