CoCA: Concept Class Analysis



Concept Class Analysis (CoCA) is a method for grouping documents based on the schematic similarities in their engagement with multiple semantic directions (as measured in the previous section). This is a generalization of Correlational Class Analysis for survey data. We outline this method in more detail in our Sociological Science paper, “Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.”

The first step to use CoCA is build two or more semantic directions. For example, here are three semantic directions related to socio-economic status:

    # build juxtaposed pairs for each semantic directions
    pairs.01 <- data.frame(additions  = c("rich", "richer", "affluence", "wealthy"),
                           substracts = c("poor", "poorer", "poverty", "impoverished") )

    pairs.02 <- data.frame(additions  = c("skilled", "competent", "proficient", "adept"),
                          substracts = c("unskilled", "incompetent", "inproficient", "inept") )
    pairs.03 <- data.frame(additions  = c("educated", "learned", "trained", "literate"),
                           substracts = c("uneducated", "unlearned", "untrained", "illiterate") )
    # get the vectors for each direction
    sd.01 <- get_direction(pairs.01, my.wv)
    sd.02 <- get_direction(pairs.02, my.wv)
    sd.03 <- get_direction(pairs.03, my.wv)

    # row bind each direction
    sem.dirs <- rbind(sd.01, sd.02, sd.03)

Next, we feed our document-term matrix, word embeddings matrix, and our semantic direction data.frame from above to the CoCA function:

  classes <- CoCA(my.dtm, wv = my.wv, directions = sem.dirs)

Finally, using the plot() function, we can generate simple visualizations of the schematic classes found:

  # this is a quick plot. 
  # designate which module to plot with `module = `
  plot(classes, module=1)