March 25, 2016

How Much is your Data Worth?


​Data is currency; it has an economic worth that can be purchased, traded and leveraged. Individuals and entities have always leveraged information to create opportunity. OpenData seeks to equalize the balance of power by increasing accessibility to municipal data. 

The term, OpenData, has turned into an international movement— prompting nations, provinces and localities to improve efficiency and effectiveness by unlocking administrative data. Federally, the White House launched the Opportunity Project to build upon this momentum to “improve economic mobility” for all citizens. The package of tools and data seek to leverage the existing federal and local data portals to help residents explore affordable housing, employment opportunities, transportation, education and other resources. 

One of the new tools merges several datasets to help New Yorkers create visualizations around quality of life. You can create maps, dashboards or download raw data to gain further insight into a neighborhood.

Using Data2Go, UNHP was able to create the above dashboard that outlines the demographic features of Community Board 7. District 7 in the Bronx has just fewer than 30% of households living below the poverty line. This Bedford Park community has a median income of $30,541 with the majority of residents working in the service sector. Over 63% of renters are rent burdened with housing costs that exceed 30% of their income; an increase of 14% from 2007-2013.

These initiatives potentially foster an environment where more people are relying on OpenData to inform policy, reinforce arguments and promote economic and social opportunity. The Government taking steps to be more transparent by committing to provide relevant, credible, and quality data is a great way to promote civic engagement and address inequity. Conversely, civic engagement isn’t a one-sided relationship. We can’t settle for institutions simply publishing data, we have to be able to provide feedback and ensure the integrity of the data. 

Locally, New York City has a relatively robust OpenData policy. There is a rapid increase in accessibility to numerous data-sets. There are over 1,350 data-sets available with 160 planned releases for 2016. New York City is even taking steps to strengthen the law; the City council recently passed several bills to improve the OpenData policy at large, though there is still no law that monitors agency compliance with the underlying law. 

Concerns and Opportunities

The quality of the data is paramount. Published data-sets need to be accurate, consistent, and descriptive. Each data-set should have a detailed description of what the data set will show and a definition of the structural elements within the actual data-set. Too often data-sets are released with one sentence descriptors or no description at all. It is unreasonable to rely on vague data sets; you can’t make conclusive statements with unknown variables.   

If you aren’t familiar with the jargon of a specific agency it can be difficult to properly analyze the data. Definitions for data-type and the terms within the spreadsheet are limited. As an example, the Department of Buildings (DOB) Violations table has a “BIN” column that outputs a 7 digit number. A quick Google search will help you find that Building Identification Number (BIN) is a unique identifier for a specific building. This key information gives the user latitude to group violations by specific buildings or by lot, which may combine data for multiple dwellings. Data dictionaries are integral in ensuring that this information is both accessible and functional.

The data on the portal should be aligned with that of the individual agencies repository. Sometimes the OpenData does not match that of the city website. When the same source yields different results, analysis is questionable and unreliable. Looking at 123-01 Roosevelt Avenue or BIN # 4536844 on the OpenData portal, there appear to be 3 open violations. However, using the DOB Building Information System platform, there are no open violations. At least one of these violations appears to have been dismissed in 1990.  If there is full audit of each data-set on the OpenData Platform these discrepancies would diminish. 


 Looking at 123-01 Roosevelt Avenue or BIN # 4536844 on the OpenData portal (below), there appear to be 3 open violations. However, using the DOB Building Information System platform (above), there are no open violations. At least one of these violations appears to have been dismissed in 1990. If there is full audit of each data-set on the OpenData Platform these discrepancies would diminish and thus provide more accurate and reliable information.

If the goal is to serve as a catalyst for a more engaged populous, it would be helpful to provide a map of how data sets are connected. Many data sets are related to other data-sets. The data-sets are often compartmentalized into different categories – together the data-sets tell a complete story. For instance the Automated City Register Information System (ACRIS) has a table with demographic information on buildings, a separate table that contains the type of property document and another table with parties on a specific mortgage document. The ONLY constant in these tables are the Document ID, a unique identifier that allows you to join all of this information. Without all three tables, the information is somewhat useless. Yet, there is nothing to illustrate a relationship between these tables on the OpenData; you are left to trial and error. 

There needs to be clarity around who publishes the data and the rate in which it is updated; if organizations, policy makers, community stakeholders and individuals are to use this powerful effort in a meaningful way. OpenData should provide a point of contact to navigate these challenges. There is a comment section on the NYC OpenData platform, but that feature doesn’t seem to illicit an official response. The ‘contact database owner’ features allows you to send an email and receive a generic response upon receipt. We haven’t had luck with follow-up emails that answer the original inquiry. Responses should be generated in a timely manner, as not to discourage users in their Data journey. 

The value of OpenData is undeniable. If the focus can shift from volume to integrity, it has the potential to stimulate innovation and improve living conditions. Like any exchange -- it’s only as good as the level in which each side participates.