Wednesday, January 28, 2015

Student Data Privacy At Risk: Powerschool, OASIS, KITE, Metadata, and the P-20W






One thing that emerged from the presentation of Dr. Paramo and subsequent conversations with other superintendents of schools in Alaska is that they have not looked at the flow of data from their school from OASIS, KITE, the P-20W, and the digital platforms in use in schools like Math 180 and APEX. I would have thought that Commissioner Hanley would have been crystal clear with his front line people, the Superintendents, about data flows. Apparently, this has not been the case or they are feigning ignorance. This article is intended to help public officials and the general public in Alaska gain an understanding of how students data is at risk.  I have tried to keep this information at an elementary level. In that effort, I may not have entirely conveyed all the problems as clearly as I intended. It may be of interest to those in other states, because Common Education Data protocols have been followed in the other 48 states, and similar challenges may exist, particularly in SBAC member states.

Sometimes people ask, "If you aren't doing anything wrong, why are you worried about others having your data." Well, if the reader has ever had their identity stolen, been the victim of Phishing (false websites seeming real), and other nefarious activity, then one might understand that it is not that anyone is doing anything wrong. Privacy is a right, and in Alaska the state constitution has a strong statement on the matter, per section 22 where it states:

The right of the people to privacy is recognized and shall not be infringed. The legislature shall implement this section. [Amended 1972]


The Common Core Standards, and the Alaska variation of these standards is at the heart of the matter. Through the uniform electronic numbering of the standards and the tagging of the words in the content, some strange things are taking place that could be posing a serious threat to student privacy. The Permanent Fund Dividend is being used for purposes for which the people did not give their consent, and that is the tip of the iceberg.

Many public officials believe that the data is secure, and they seem to be either unaware, or in denial, about where that data goes and what is done with it. There are two fundamental types of data addressed here: the traditional data that one would expect schools to have for the purpose of keeping transcripts and other traditional services of K-12. The other kind of data is psychometric in nature that is generated from student responses in digital learning platforms, also known as "fine grain" data, that is more detailed than the metadata the NSA tracks.

Traditional Data. When a parent in Alaska takes their child to school, their data is entered into the school computer with a variety of other information. The Online Alaska Student Identification System, or OASIS began as a mechanism for the State of Alaska to assign a unique number to a student in an attempt to keep the funding with the student. But over time, the system has become a bit more than simply a student number system. It is now a comprehensive database that contains medical information in addition to information needed for federal funding of programs (e.g. school lunch), as well as traditional information needed for transcripts.

Some schools have some of this data is viewable in an interface called Powerschool by Pearson, which poses other interesting problems that will be discussed in the section on meta-data. Because of the way that Powerschool has been set up, the metatags actually attach themselves to the student. It may not be visible in the Powerschool or Oasis side of the application, but it does appear on the Pearson side.

Today, OASIS is far more than a database of students as they progress through the K-12 system. It now includes staff data and other district, or Local Education Agency (LEA) data.

screen shot of the top level screen in OASIS

As show above, there is a main screen, and underneath this screen there are subsequent screens of data; sometimes as many as 4 screens in depth. Medical information can range from vaccine data to date of the mother's first prenatal visit and weight gain of the mother before the student was born, as show below. No matter that the child was not born in Alaska-- the database is interoperable with other states and agencies, so if the data is electronically held anywhere, it can be located and attached.





Originally, school enrollment was the basis of OASIS.  That architecture was modified in 2004 for reasons beyond this blog entry, and later again modified after the creation of the P-20W. It is in ASCII delimited format and now resides in the enterprise system.  The inclusion of the clearinghouse and other areas ensures that students can be tracked once they leave the state.




 Once  that data goes is into the P-20W database from OASIS, it has to be validated. Because OASIS data does not typically contain the Social Security Number, other data attributes (name of student, maiden name of mother, address, etc) is used to match the record to the PFD database for validation and matching to social security numbers.  To do this, the state makes use of the Master Person Index from the Alaska Permanent Fund (PFD) and the Department of Labor wage database.  A P20W-ID number is generated based on the Social Security Number (SSNs).  This is show in the diagram below.



Those who understand SSNs quickly realize other data is embedded in this number itself which can be added to the file.

The data is then matched with data from other state agencies, such as the Department of Labor, the University of Alaska, and other agencies at the state and federal level. Once the verification process occurs, similar matching is done on staff members and their alumni institutions. In this way, not only does a student's test reflect the performance of the teacher, but also counts toward the value of the school and program that the teacher attended.  This forms a system of information that flows into a continuous loop among government agencies and the federal government. While the federal role is not displayed in this diagram, the DOL wage data goes to the federal government, the PFD data goes to the federal government, and the federal government pays for the construction of the P-20 in exchange for data that is at the individual level. Therefore, even if school records are not directly handed over to the federal government, the data gets attached to data that does.





KITE Data: When these drawings were conceptualized, Alaska's consortia was not finalized.  The data set was also designed to receive the consortia data from Smarter Balanced Assessments (SBAC).  For Alaskans, the KITE client, or software application from Kansas AAI testing folks that delivers the test to students, sends their data to Questar, and returns the information to the P-20W. KITE appears to be based on an open source application from an India based firm known as Agile Technology Solutions (see custom application development). Like SBAC's (see pages 34-38 at the link) delivery system, KITE appears to be open source, which means the programing language is openly available. While it is unclear at what point in the loop the test data and the psychometric data enters, it seems likely that it enters into the loop through "external researchers" in the diagram below.





The data from OASIS has been preloaded into KITE in Kansas and Questar in Minnesota; this is per the contract timetable shown below. It also seems likely the data could have been backed-up at the Institute for Education Sciences at the National Center for Testing at the U.S. Department of Education.  This data exchange should have happened in November of 2014 before the December test run in Fairbanks. From KITE, the data will join the data from the other consortia at Questar, the grading company in Minnesota. Questar then relays the results into KITE and then into the P-20W or OASIS.






In addition to test scores, various psychometric metrics are gleaned from the test. KU as hired a  psychometricians for this purpose. Each verb on the test is mostly likely tagged to one of the many "Domains of Knowing" or DOK of Common Core. Those parents who have students in CTE should also consider the verb list at this link from questions and how it is coded to the Common Core DOK; a similar activity in undertaken in the AMP test.

My guess on the next part is that the data is downloaded to the P-20W and into OASIS and then becomes available on the Enterprise system server for school district personnel.


Metadata, fine grain data, and psychometric data: Metadata is the data comes from a variety of sources that is generated by your student's on-line interactions. Various software packages, like Powerschool, contain some of that data and offer a portal for the meta data to be matched to the student and to enter OASIS and the P-20W.  The CEO of Pearson, the publisher of Powerschool used throughout Alaska, specifically names Powerschool as one of the software items in which metadata is gleaned because all their digital learning platforms are connected to it.


Metadata is generated by online digital learning platforms. Since the questions are tagged in the learning platforms they can be analyzed to present a whole new set of variables. Further, because the standards each have a uniform electronic number (UEN), the data from digital learning platforms and  student response theory can be used to determine how your child processes information.  In this method, your child's mind can be mapped and identified through their mental process. The data is housed in open application program interface (API) that anyone can access, change, or modify. 





When combined with psychometric theory such as item response theory, this information can be a powerful tool for misuse and nefarious activity. This data is being captured in a variety of digital learning platforms and maintained in an open environment that is accessible to anyone. The data can also be altered by anyone and deposited over the prior data. Sound like a conspiracy theory? Below are some excerpts from a conference called Datapalooza held by the White House in 2012. Notice that the Common Core Standards is what holds it all together.



Notice that the Common Core Standards is what holds all of these projects together.

 Does your superintendent even know that this data is being generated? Does your superintendent know that it is being linked to Powerschool? Did parents consent? Did the Borough Mayor's consent? If Dr. Paramo did not know about the state P-20W and thought the grant had been given back, how much does she know about the other meta data being generated? How much less would the other superintendents know?

There is also a question of data ownership here. In some cases, borough taxes pay for some of these digital learning platforms; is the data the property of the software company or local government or the parent?  Further, as is indicated at the end of the video, more databases will be combined with this digital data over the long term. To what use will that data be used? One can only imagine the usefulness of this data for those in the field of marketing: it is the ultimate in inside information.   Millions will be made from this data that no one consented to giving, except perhaps in some obscure EULA  that parents most likely did not agree to and may not even know about. Who is entitled to these revenues?

Security also seems to be problematic. The notion that anyone can get the data out and plug it back in sounds like a system that people interested in identity theft or other mischief might find useful, particularly if this can be mapped to the brain to determine likely course of actions. Imagine this information in the hands of a vengeful lover? A stalker? An ex-spouse? Or perhaps used by a future government that has gone rogue, a corrupt public official, or other possible scenarios?

Can you imagine how this information comprises the nation's defensive posture? Imagine an enemy in a state of war having a future general's psychometric data and predicting their next move?

As a side note, parents who are using K-12 login would be wise to ask about the tagged data and the analysis that emerges.

State Senator Gary Stevens has a bill to encrypt the data and expand the data network under the guise of data security.  The bill woefully misses the mark, fundamentally misunderstands the Alaskan concept of privacy. It codifies the P-20W and permits expansion into new areas. The bill goes in the wrong direction and and is aimed at the wrong level of government. It isn't worth a reading of the rules committee. OASIS needs to be taken off the enterprise system and retained at the district and in some way walled off from the rest of the P-20W.   There needs to be real teeth on allowable uses of metadata, and open APIs are prima facia unacceptable.

Time for borough mayors and school districts to unplug from OASIS and give a serious review of the meta data generated in those platforms. That may be where parents and citizens need to make their first line of inquiry. While there is certainly a "Wow" factor in the presentations by these vendors and the potential for good things, there is also the potential for extreme abuse. As the vanguards of privacy and freedom, local officials would do well to start asking questions and deciding for themselves if what they have to offer districts is worth the risk. Parents would be wise to ask the same.

My children are grown. But if they were still young, they would not be taking the AMP or be engaged on tagged digital platforms. I won't tell others what to do, but I can say what I would do today.  I can tell you that years ago I moved to the edge of no-where to reduce the temptation of online gaming for my own children before this potential was anywhere near the state of development it is today. How much more so would I do today.


1 comment:

  1. Thank you so much for all of your research and for sharing it. My son's private school in Wisconsin uses PowerSchool by Pearson. I have been told that PowerSchool does not share data outside of the school system. I have found it difficult to believe that one of Pearson's data collection programs does not contribute its data (for certain schools) to databases outside of the school system.

    ReplyDelete