Data journalist Nate Silver is perhaps best known by the general public for his success at predicting U.S. election results based on statistical data. But his background is in sabermetrics—the statistical analysis of Major League Baseball. Baseball provides a fertile field for such analytics. The box score, invented in the late 1800s, records every statistical point of every single at-bat, and is a format ripe for digitizing. Silver leveraged this vast pool of information to create PECOTA, an empirical system for forecasting the performance of pro baseball players. Silver sold PECOTA to Baseball Prospectus, which now markets the product for fantasy baseball leagues, and his 538 blog is licensed by the New York Times.
As Silver demonstrated by correctly forecasting the results in 49 of 50 states in the 2008 U.S. presidential election, baseball isn’t the only field of endeavor that can be served by access to volumes of data and the tools to interpret it. They are fundamental to evidence-based decision-making. And while government agencies might not have a literal play-by-play of every event that has an impact on their mandates, the public sector comes close to matching baseball’s ability to generate and appetite to consume data.
The information the government collects is public property, and it has huge potential to incubate value-added services and products when the data and tools to manipulate it are exposed to public users. This is at the heart of the open movement.
The word “open” in a public sector context can lead down different trails of discussion. When someone uses the word “open,” what does it mean? “Open” generally refers to one of three discrete but interrelated concepts. Think of it as a stack, with each layer predicated on the one beneath it.
Open Source
At the base of the stack is open source software (OSS). Open source evolved from the free software movement begun in the 1980s. Open source is a more accurate description than “free,” which, to its advocates, meant freedom to distribute, not freedom from cost—free speech, not free beer. This usually non-commercial software is subject to a general public license (GPL) based on the concept of copyleft—users are free to tinker with or develop on the source code, provided the fruits of that labour are returned to the pool for others to use. Developers can even compile software into commercial applications for sale.
Open source products were often positioned in direct competition to commercial options—notably Sun Microsystems Inc.’s OpenOffice, distributed as an alternative to Microsoft Corp.’s dominant Office suit, and Linux, an open source alternative to the UNIX and Windows operating systems and development environments. This friction developed into something of a religious war in the 2000s, especially in the public sector. Many jurisdictions mandated open source software be used or at least considered for acquisitions, citing licensing costs, vendor lock-in, and forced upgrades, among other factors, as drawbacks of commercial software. Software vendors countered with campaigns pointing out the hidden costs of OSS, including support, maintenance, upgrades and application development.
Many commercial software vendors, including our company, have embraced the need to integrate such open frameworks instead of competing with them. Two-thirds of Web servers worldwide run on open source Apache software, and open source tools—like Hadoop, MapReduce, and NoSQL—are central to applications leveraging the huge volume of data that is generated and collected in our connected world.
Open Data
The public sector thrives on data. It is collected through a number of sources: surveys, transactions, applications, sensors, case notes … the list goes on. Daily, the three Vs of data collection—velocity, volume, and variety—increase. Data is no longer confined to field and record in database entries; it’s increasingly unstructured and gathered from automated sources and social media.
Canada has an enviable record and reputation on the data collection and analysis fronts. Statistics Canada is hailed worldwide for its support of evidence-based decision-making; Canada currently chairs an international working group on open data through the Open Government Partnership, with 65 participating countries.
Open data, a cornerstone of open government, reflects the principles of the open source software movement. The Canadian government defines open data as “machine-readable, freely shared, used and built on without restrictions.” Data must be freely available at no more than a modest reproduction fee, open to redistribution and reuse, and accessible to all, with no discrimination against people or groups, no distinction between commercial and non-commercial uses. The public even participates in prioritizing datasets for release from among the almost 10,000 government holdings so far identified by giving them a “thumbs-up” on the Open Data Inventory Web site.
The underlying principle of open data is that government-held information is provided by the public (in the form of censuses, surveys, application forms, tax documentation, etc.), or its collection is paid for by the public, and thus it is the property of the public and must be shared with the public. The public, under the Open Government License (OGL), may “copy, modify, publish, translate, adapt, distribute or otherwise use the information in any medium, mode or format for any lawful purpose,” provided licensees identify the source of the data on their derived products, according to the federal government.
Exactly how that data is to be shared has been open to some interpretation. Some jurisdictions simply use their existing publishing paradigms—spreadsheets and electronic versions of reports—to expose data for public consumption. At the other end of the scale are raw data feeds, extremely granular transaction data that require considerable technical expertise to interpret and consume. Given a template of bus routes, schedule data, and live-streamed vehicle-mounted GPS data, one can offer an application that predicts the next arrival of a city bus at any given stop, but that’s beyond the sophistication of most citizens.
Increasingly, though, public sector organizations are incorporating visualization tools to give users a window into the data that makes it easier for them to grasp and manipulate. Born in the corporate world, data visualization tools are designed to democratize data consumption in the enterprise. Rather than having a business user define an analytical problem and pass it on to a technological specialist to design iterations of a query, visual analytics tools provide an intuitive interface to the data that puts the power in the hands of the user. In the enterprise world, this eliminates time and money wasted on misinterpretation and faulty queries, allowing the subject matter expert to hone their understanding of the data on the fly.
What does that look like in the open data world? Increasingly, jurisdictions are exposing data to public users in the form of a number of template views, but with intuitive tools to change parameters to create their own windows into the data. In conjunction with freely available open-source tools, users can generate, for example, geographic heat maps of wireless communications density, historical progressions of property crime rates by region, public safety dashboards, and so on. The applications are limited only by the availability of data sets and the imagination of the user. Visualization tools inspire curiosity; the ability to intuitively substitute, manipulate and synthesize data sets and views is engrossing and encourages exploration and exploitation of public information assets.
As a corollary, datasets that can be shared with the public at large can also be shared among various government departments.
Government suffers an unfortunate reputation for bureaucratic wrangling. In fact, any corporate body with more than one department shares this issue. Historically, corporate entities have developed along siloed lines; accounts receivable deals with money that’s due, accounts payable with money that’s owed, and if the twain shall meet, ad hoc consultation between departments is required.
So it is with the government. Departments are driven by mandates that can be, by necessity, fairly narrow—it’s not efficient for them to overlap. Take, for example, a restaurant owner who wants to expand by building a patio. This could involve several departments with different remits (building standards, health standards, licensing, public safety, land use) at different levels of government (municipal, regional, provincial).
In an open data scenario, it is relatively easy to automate a process to alleviate such a wrangle. Much of the information required by one department is often already held by others. In theory (though not necessarily in practice), any one service could act as a single point of contact to submit a proposal.
Open Government
Sharing information and allowing stakeholders outside government to be part of the conversation becomes critical as a more holistic approach to program delivery becomes ascendant. One of today’s hot-button issues is that of safe injection sites for intravenous drug users. It touches on the mandates of public health, health care, public safety, social services, policing, and more. Effective delivery depends on co-operative (or at least informed) action from all departments. Program delivery in the areas of post-millennial concern—homelessness, elder care, accessibility, etc.—is increasingly cross-departmental.
Open source and open data, both technologically and philosophically, enable open government. But open government isn’t simply the sum of the two. Open government is a cultural change. According to the federal government, it is “a governing culture that fosters greater openness and accountability, enhances citizen participation in policymaking and service design, and creates a more efficient and responsive government.”
It’s a well-worn aphorism that knowledge is power. Knowledge hoarding has been an organizational bane in both the public and private sectors, as employees try to preserve their jobs and departments to defend their mandates. The open government reality—that shared knowledge is shared power—can be intimidating. But that is exactly the goal of open government—to harness the knowledge, insight and talent of citizens, community groups, academics and corporations to solve real-world problems and address real-world priorities.
An open government model augurs a brave new world for participative democracy:
- Social media channels can be used for public consultation and assessment of the impact of programs. It’s a matter of reaching the public where they live; they needn’t travel to a literal town hall to give input. It’s important to measure social media engagement—sentiment analysis can be a valuable tool to gauge the attitude of citizens toward programs.
- Departments can create collaborative spaces online that allow stakeholders to work together as an integrated team rather than simply sharing data.
- Virtual and real-world gatherings like hackathons and competitions can be used to crowdsource solutions. These events are often organized around themes, looking for citizen contributions to problems in areas such as accessibility, user experience, refining the collaborative process, etc.
Getting The Most out of Open Data
Open data is not open government. But open government is clearly hampered by a lack of the former, or ineffective exposure of it. It is important to bear certain characteristics in mind in order to reap the most benefit from an open data regime.
- The open data journey begins with an inventory. Many jurisdictions have assembled such an inventory of their data assets and made it accessible to the public. With many competing priorities, all data sets can’t be exposed at once. The federal government allows members of the public to vote to determine the priority of the almost 10,000 data sets in its inventory.
- Open source software is an important element of the open data regime. One of Canada’s open data principles is the use of “commonly owned standards” with respect to file formats; data released by the government should be accessible to all, not only those with access to an expensive program that can read data in a proprietary format. Open source software is also an important contributor for the management of unstructured data that doesn’t fit the traditional field-record format, like government contracts, libraries, drafts of working documents, etc. (open information).
- Data provided by the public is owned by the public. Data sets should be considered freely accessible unless there is a compelling reason not to disclose it—for example, that it is personally identifiable private information. Care should be taken that data sets are anonymized or aggregated.
- Visualization tools give technically unsophisticated users a window into the data. The ability to substitute, synthesize and manipulate parameters inspires curiosity and the development of original applications of the data. Exposing data with visualization tools should be considered a best practice.
- Open data principles apply within government as well as public-facing applications. Open data sets should allow exchange of information among departments for a more horizontal service delivery model and the ability of citizens, community groups and corporate entities to actively participate in forming policy or delivering services.
TARA HOLLAND currently leads the Canadian Federal Government Specialist team for SAS Canada, spearheading Public Sector efforts by providing data & analytic expertise and business guidance to Public Sector (Federal, Provincial, Public Safety and Healthcare) organizations across the country. With over 19 years supporting Predictive Analytic, Business Intelligence and Visualization solutions portfolio; her passion is in bridging the gap between Business and IT and enabling organizations to deliver on the value of SAS Business Analytics.
SAS is the leader in analytics. Through innovative analytics, business intelligence and data management software and services, SAS helps customers at more than 83,000 sites make better decisions faster.