The graph above represents the median income for each major by share of women, grouped by major category. With the original dataset dividing majors into 16 categories, certain categories were combined and compressed in 10 categories more resembling those used by CollegeBoard. The 173 majors were maintained and plotted, grouped by color by the major categories. Median income for each major is shown on the y-axis, and the confidence interval shows a roughly negative linear relationship Additionally, the share of women variable was calculated using the number of women and total number of individuals in each major category (note: this dataset coded gender as a binary).
The confidence interval added to the graph above shows a roughly negative and linear relationship between the share of women in a particular major and what that major ends up making (median income). For example, petroleum engineers claim the highest median income of $110,000 while being comprised of only 12% women. In contrast early childhood education is 97% women and makes a median salary of $28,000.
I started this visual interested in examining what the share of women are for the majors included in the dataset. I started by faceting by major category, but this visual made it difficult to compare across groups or identify any larger-scale trends. Also, I wanted to add color, but coloring by the major category potentially made the visual misleading.
From there, I wondered how the majors would look together on the same graph, grouped by the major category. With there being 16 unique categories in the dataset originally, there were too many distinct variables to graph in a way that is receptive to the viewer (there simply were not 16 distinct values to encode). However, I did feel that this second iteration got me closer to the graph I wanted to eventually produce, as I could see what was potentially a moderate negative relationship between share of women and median income when grouped by major.
Using fct_collapse
, major categories were redone, with help from CollegeBoard guidelines, and geom_smooth
was added to explore the the relationship between x and y. Since I still wanted the focus of the graph to be each of the majors/major groups rather than the strict line, I removed the estimated line and kept the confidence interval as more of a background and guide to the relationship seen by eye. I finally used geom_label
to clarify the interpretation of the graph by providing some examples.
For each major category in the dataset, the percentage of individuals who were employed full-time, part-time, or unemployed are shown in the graph above. Arts and humanities have the lowest full-time employment percentage at 63%, while business majors have the most employed full-time after graduation, at 78%. While unemployment hovers at 5-7% for all major categories, the level of employment differs major to major.
For the sake of consistency and brevity between visualizations, the 16 original major categories were combined and compressed in 10 categories more resembling those used by CollegeBoard.
I started off curious to understand the unemployment rates for each major, since some majors are often marketed as more or less employable than others. After calculating the rates for each employment type, I stacked the columns since the proportions add up to 1.
To reduce cognitive load, the x-axis was changed to percentage and geom_text
was used to add the percentages for each of the employment types. I was surprised to see the rates hover at 5-8% across the board; the range came rather from the level of employment. However, I thought the narrative could still be a bit clearer.
The palette was changed to be colorblind-friendly, and the bars were then ordered by full-time employment rates, low to high. The percentages were kept and adjusted for readability, and the collapsed major category variable was used for consistency.
This graph shows how the share of women compare for each major category, sorted by income. For example, the highest income major, engineering, is comprised of 76% men and 24% women, while the lowest income major, education, is comprised of 19% men and 81% women to the lowest income majors (gender was coded as binary in this data). For this plot, even though I went through more complicated iterations, I wanted to keep it purposefully simple, drawing straightforward attention to gender by major.
I wanted to look at the share of women for each major category more explicitly, since each of the majors were plotted on the scatterplot before. Each of these iterations shows the gender distribution for each major, starting with the bar graphs dodged. While I liked the bars lined next to each other, the alternating colors of the bars made it difficult to know which columns to compare.
Since I still wanted them to be side by side, facet_wrap
placed them side by side and sharing a common y-axis, so the column sizes would still be easy to compare. To reduce cognitive load, the percentages for each major category were included on each column.
The last version of the visual focused on making it as simplistic as possible, continuing to reduce cognitive load and avoid visual artifacts, while adding the additional layer of median income.