Chi-Square Test in Python: Analyzing Enabling Factors of Mental Health at Work

Aisyah Rahvy
5 min readMar 16, 2022
Photo by Marcel Strauß on Unsplash

Mental health has been placed under the spotlight for many years in various settings, including the workplace. Since World Health Organization (WHO) has defined health as “a complete state of complete physical, mental, and social well-being”, we can conclude that mental health plays important role in our productivity potentials.

Green and Kreuter are the ones who introduce the PRECEDE-PROCEED model which explains the causal and action theory of outcome (Gielen et al, 2008). PRECEDE, which describes the causal part, includes 4 phases: 1) social assessment; 2) epidemiological, behavioral, and environmental assessment; 3) educational and ecological assessment; 4) administrative and policy assessment and intervention alignment. Take note that the third phase tells us more about predisposing, reinforcing, and enabling factors of action.

Well I guess the title of this article has given you a little ‘spoiler’ about what I will explain further. Enabling factors are antecedents to behavioral or environmental change that allow a motivation or environmental policy to be realized” (Green and Kreuter, 2005). So it will include programs, resources, and services needed to do desired actions.

Now let’s get into our dataset. I use Mental Health in Tech Survey (2014) dataset for this analysis. It measured measures attitudes towards mental health and frequency of mental health disorders in the tech workplace. The purpose of this analysis is to search for insights related to mental health enabling factors in tech company.

Dataset Feature

The dataset has 26 variables. After conducting data cleaning phase, I removed comments since it has >50% missing values. I also filled in some missing values of state, self_employed, and work_interfere by using mode.

For categorical analysis, I wanted to know further about the distribution of seek_help, wellness_program, and treatment variables based on benefits. Benefits variable explained the availability of mental health benefits provided by workplace, while resources availability to learn more about mental health was described through seek_help variable. Wellness_program variable was about discussion conducted by employer as part of employee wellness program, and treatment told us about employee experience to seek for treatment. These all four variables are enabling factors of mental health in workplace.

Data distribution of seek_help, treatment, and wellness_program based on benefits

I made three visualization graphs as shown above by using sns.countplot. From the graphs, we know that most respondents with ‘yes’ answer of benefits will likely seek for help and so do the contrary. This is also in line with treatment variables. Meanwhile, most respondents with ‘no’ answer indicated that employer discussed mental health as part of wellness program. This distribution gives us the initial hypothesis that the availability of mental health benefits in workplace may be associated with the seek_help, treatment, and wellness_program. So in statistics, we can write our hypothesis as:

  1. H0: There is no correlation between seek_help and benefits; Ha : there is correlation between seek_help and benefits
  2. H0: There is no correlation between treatment and benefits; Ha : there is correlation between treatment and benefits
  3. H0: There is no correlation between wellness_program and benefits; Ha : there is correlation between wellness_program and benefits

Since all of our variables are built of categorical data, we may use Chi-Square test to examine the relationship between two variables. In this analysis, I dropped other answers rather than ‘yes’ and ‘no’ so the final analysis would contain only such answers of both independent and dependent variables. I also used CI 95% in this test like most of health field researches did.

Drop codes revised from phytonfordatascience.org

I applied this function to all variables which have other answers besides ‘yes’ and ‘no’. Then I preceded to do the cross tabulation and calculate the p-value of each chi-square test.

Cross-tabulation for each variable

After making cross tabulations, I calculated the p-value of chi-square test using scipy.stats by importing chi2_contingency. Note that I had imported stats from scipy before.

And here we go: the chi-square test analysis. I used codes as written below.

Chi-square test for each variable

From the output, we know that benefits and wellness_program (crosstab) had correlation with p-value less than 0.05. This was also aligned with benefits and seek_help (crosstab1), as well as benefits and treatment (crosstab2). Since all cross tabulations’ p-value is less than 0.05, we may reject H0. In conclusion, there is correlation between each cross-tabulation we have analyzed.

This initial analysis may give us more insights about how mental health issues at work are associated with so many factors, including what we call as ‘enabling factors’ explained by Green and Kreuter (2005). To ensure the correlation between more factors or variables, we may need to conduct more advanced analysis. Since the survey did not provide the score of mental health problem, we can only perform categorical data analysis with Chi-square or Fisher test. Therefore, I highly recommend the use of scoring in mental health survey so we may conduct logistics regression analysis when the independent variables are categorical, or even multiple regression analysis if the researchers choose to use all variables as numeric.

Because mental health is not given, it is something that we must tend to, nurture, and hold sacred.

References

https://www.researchgate.net/profile/Daniel-Montano-2/publication/233894824_Theory_of_reasoned_action_theory_of_planned_behavior_and_the_integrated_behavior_model/links/0a85e53b67d742bc29000000/Theory-of-reasoned-action-theory-of-planned-behavior-and-the-integrated-behavior-model.pdf#page=445

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5121696/pdf/jcdr-10-LC01.pdf

https://www.geeksforgeeks.org/python-pearsons-chi-square-test/

--

--

Aisyah Rahvy

Health Economics enthusiast. A master-degree student in Universitas Indonesia. A research assistant of ACeHAP (the Airlangga Centre for Health Policy)