Methodology – Geo School

SELECTING SOURCES

To investigate educational inequality through a geographic lens, we adopted a multi-layered approach that prioritizes statistical and contextual breadth. The backbone of our research is the Public School Characteristics 2022-23 dataset from the National Center for Education Statistics, selected for its standardized variables including variables like student demographics, student-to-teacher ratios, and the percentage of students eligible for free and reduce lunch.

However, relying solely on institutional datasets can perpetuate what D’Ignazio and Klein describe as “silences” in the data, where structural inequities are hidden by lack of context (2018). To fill the silence and provide context for our geospatial findings, we integrated qualitative sources from the UCLA Library databases by conducting keyword searches such as “geographic education,” “school funding inequality,” and “segregation and education.” These peer-reviewed articles and news archives were gathered, allowing us to adhere to the Data Feminism principle of “Consider Context,” ensuring the data points are treated as reflections of historical realities (D’Ignazio and Klein 152).

PROCESSING DATA

The data processing pipeline was designed to not only clean the data, but explore it for patterns, viewing the dataset as raw ingredients that needed to be understood before consumption (Ya 142). The processing began in Google Sheets where we removing columns unrelated to our research questions to improve manageability. From there, we used OpenRefine for more granular cleaning. We standardized entries and handled missing data. Some negative values, often used by NCES to indicate missing or inapplicable data, were replaced with null values to prevent skewing. Rows related to adult education were removed as our project specifically focuses on K-12. We also filtered out improbable outliers, such as schools reporting 0 students or 0 teachers, and student-teacher ratios exceeding 70:1. These outliers were likely data entry errors that would have distorted our visual scales and obscured the true distribution of the data (Yay 139). Moving forward, members of the group had familiarity with R programming and Python, which helped with advanced data cleaning and in-depth visualizations.

PRESENTING THE NARRATIVE

The presentation strategy focused on constructing a clear, evidence-based argument. Following Kate Turabian’s guidelines for acadmeic arguments, we designed the website to state our main “claim” that established a “social contract” with the reader. This ensured they understood the argument we intend to prove before engaging with the evidence (51).

Our website was created using WordPress provided by the Digital Humanities department at UCLA, chosen for its simplicity and accessibility. We are able to focus our labor on the narrative and visualizations rather than complex coding. We specifically chose a green and blue color palette not just for aesthetic appeal, but to reinforce the geographic nature of our inquiry, helping the viewer visually connect the abstract data back to the physical world. For our visualizations, we utilized Tableau and Palladio as these tools were introduced to us during class. In line with Yau’s advice on “Designing for an Audience,” we chose interactivity to allow users to explore the data at their own pace, making the evidence more personal and convincing (Yau 165).