Data Critique

In creating the metadata standard, what are the ontological choices made by the government? What are the implications of these bureaucratic imaginations?

1. Data Generation and Original Sources

This dataset was generated through the Common Core of Data program (CCD) and is ultimately managed by the National Center for Education Statistics (NCES). The CCD annually collects administrative and fiscal data on all public schools, school districts, and state education agencies in the U.S. This data is supplied by state education agency officials, who also provide basic directory information for the schools and their respective districts, student demographics, number of teachers, and school grade span. Additionally, the dataset includes the specific longitude and latitude for public schools.

The CCD is made up of five main collections and a few supplemental ones, which are divided into two categories: nonfiscal and fiscal. The nonfiscal side focuses on school directories, enrollment numbers, staffing, and special programs, while the fiscal side looks at funding and expenditures. The nonfiscal collections include the Public Elementary/Secondary School Universe, Local Education Agency Universe, and State Nonfiscal Public Elementary/Secondary Education Data.

All nonfiscal data (enrollment, staffing, lunch eligibility, and special education participation) begins with State Education Agencies (SEAs). These agencies gather and submit data through EDFacts, the Department of Education’s centralized data system. EDFacts enforces consistent file specifications such as variables, formats, and categories. The data is verified and organized, then released at three levels: state, local education agency (LEA), and school. Finally, all content, including the release notes and documentation, is made public through the CCD Reference Library and Elementary/Secondary Information System (ElSi).

2. Dataset Funding

The NCES’s primary role is to act as the federal entity responsible for collecting, analyzing, and reporting data related to education in the United States and other nations. The dataset’s creation is funded by the U.S. Department of Education’s Institute of Education Sciences (IES). To provide funds to the NCES, IES receives funds from the U.S. federal government, specifically the Department of Education. NCES is located within the IES, and NCES enables the Department of Education to fulfill the mandated requirements of providing full reports and complete statistics on education in the U.S. for audiences, such as Congress, the states, educational policymakers, practitioners, and general data users, to have access to high-quality data and improve education needs. NCES staff recently received an allocation of $35 billion in Federal aid to continue carrying out their duties of collecting data on U.S. education. This federal funding is crucial as the data collection, variables, and categories are directly tied to national policy goals, federal law, and congressional mandates.

Yet this dependence on federal priorities makes the dataset vulnerable to ideological shifts across administrations. Each administration arrives with distinct educational philosophies that reshape not merely how data is interpreted, but what constitutes data worth collecting in the first place. A reform-oriented administration might mandate detailed tracking of charter school performance and standardized test scores, embedding market-based education models into the data infrastructure itself. Conversely, an equity-focused administration could require disaggregated reporting on LGBTQ+ students, socioeconomic mobility indicators, or restorative justice practices, creating new categories of visibility for previously unmarked populations. These shifts transform the fundamental structure of knowledge production: federal funding determines which educational realities become measurable and which remain invisible. The dataset thus may function as more than neutral record-keeping, but as a particular administration’s vision of what educational “problems” exist and which populations warrant intervention.

3. Excluded Information

The dataset’s administrative nature is a limitation as it provides a “skeleton” of the school but reveals little information about its effectiveness. The most significant omission is student outcomes, as there is no information on academic performance (test scores, graduation rates, college matriculation data, attendance). The data measures the inputs such as staff and students, but not outputs on learning or achievement. A second omission is financial data which allows analysis on per-pupil spending, teacher salaries, investment in technology, or the correlation between funding and student demographics. The NCES collects detailed financial information as an entirely separate dataset. Lastly, the data is entirely quantitative, lacking all qualitative context. A low student ratio may imply class small sizes, however there is no information on teacher quality or morale. A high total enrollment implies nothing on school culture, curriculum, extracurricular activities, or building conditions.

4. Ideological Effects of Data’s Structure

In the location column, the dataset included many subjective commentations on the status of the public schools’ location. These remarks include: Remote, Distant, Fringe, Small, Large, Medium. One problem with this kind of classification is that it was trying to fit two categorical variables, size and centralness of the school, into one column. It creates an impression that the column’s classification is subjectively highlighting the “negative attributes” of the school’s location and thereby demonizing certain geographical features. Thus, the classification can directly influence policy and funding assumptions.

This column documents whether the school is virtual or in-person. This reflects an increase of attention to the virtual format of education after the global Pandemic. This column only gives us information about their current academic status; it doesn’t reflect the effects of virtual education and how that has imprinted on student learning.

In this column, the department of education classifies types of schools into four exhaustive categories: Alternative education, Career and technical education, Regular school, Special education. The first thing to note is that the dataset framed schools that do not follow alternative education, career education, or special education curricula as “Regular.” This practice could be criticized as marginalizing the more diverse gateways to being educated. In addition, the dataset ignores other forms of education including Private Education or home schooling.

The set of demographic columns is a powerful ideological choice as they are not natural realities, but codified by the federal Office of Management and Budget. By mandating all states to collect and report data using the exact categories of American Indian, Asian, Hispanic, White, etc. , the federal government creates and reifies this specific racial worldview. This makes these groups “statistically real” and shapes subsequent policy and research on equity while rendering other groups invisible (such as those of Middle Eastern descent who are classified as “white”).

1. Data Generation and Original Sources

2. Dataset Funding

3. Excluded Information

4. Ideological Effects of Data’s Structure

The Location column+

The “VIRTUAL” column+

The “Type” column+

The “Race and Ethnicity” columns+