While many of the metrics provided by OII are collected from external sources, much of this metric data will need to be stored. One reason is to provide caching for the data so that excessive API calls to external resources are avoided. These external resources may also only make point-in-time data available, and some measures require knowing how this data has changed. Metric sources may also provide data covering several measures or may need to be combined with data from other sources in order to be useful. For all of these reasons, a Metric Data Store is required.
Metric data may be highly structured or completely unstructured. It could contain a single value or a large block of text or other data. These requirements narrow the field of potential solutions down to either a document datastore such as CouchDB or a file-based store such as git. While git provides nice functionality such as built-in history and nice interfaces and libraries for accessing it, CouchDB can be considerably faster and is more easily queryable. CouchDB is also easier to work with from code.
The Metric Data Store can be populated directly by Data Collection Agents deployed by OII or through calls to the Core Data Services in the case of externally-managed agents (or hand-entered through the Web Front End).
We'll use Event Sourcing so that the main way in which data is created in the system is through Events. This approach is similar to append only storage and Git and provides increased transparency and accountability, as well as more flexibility to evolve the data model progressively.
The implementation will also provide different write and read channels which will help scale better. We're currently using CouchDB as an event database using a data structure close to Datomic.
One of the biggest technical challenges for OII is the collection of data to develop metrics with. For a given measure, there may be alternate sources for data depending on the type of project being reported on, the platforms supported by the project, and whether the project is open or closed source. There are also varying forms of data, from simple numbers to blocks of text to whole files. Finally, the data collected might be useful on its own, but it may need to be transformed or compared with previous measurements to provide value.
In order to accelerate development of metrics and allow inclusion of metric which simply could not be collected by the OII team (as well as metrics for which part of the value is in the organization which has collected and verified them), a number of data partnerships are being formed. The technical implications of this are that OII needs a way to both collect data and receive data collected by others. This capability is provided by Metric Data Collection Agents.
Metric Data Collection Agents can be written using whichever language and frameworks are favored by the developer and appropriate for the task. The high-level requirements for these agents are:
In addition, considerations should be made for the following:
Data Projections allow the creation of data representations from the Event Store that help access the data and query it in useful ways. In our case, the data is inherently structured as a graph, and should also be represented as nested objects for frontend access.
We currently have projections using CouchDB again (particularly to do cascading reduces), in leveldb (with levelgraph and levelgraph-jsonld for object document mapping). We expect more data projections facilities to be necessary at some point, in particular to enable snapshotting when the data in the event store grows and projections are expensive to recreate from scratch.
The API layer provides a unified access to the various data stores and services. This includes:
Currently some services are implemented as CouchApps, others as node based microservices and the API layer is migrating from a simple Hapi based proxy to a GraphQL setup. GraphQL (which is not really providing a graph interface but more of a tree interface to data) is an approach based in Type Theory which helps streamline access to multiple data backends and helps create modern and scalable rich frontend apps.
The web front end is the face of OII and allows a user to interact with the API without any programming knowledge. This site has two goals
— to make data on tracked projects available for viewing and to educate users on how the data is collected and how it might be interpreted.
Three potential implementations of the front end were considered: static site generation using Content As Code (http://iilab.github.io/contentascode/), an off the shelf CMS, and a fully custom web app. All have advantages and drawbacks. The table below summarizes these pros and cons in the context of the requirements for the OII front end.
|Static Site Generator (Content As Code)||CMS (e.g. Drupal)||Custom Web App (React)|
|Maintenance of existing functionality||⭑⭑⭑ Once it’s deployed, it’s fairly maintenance free||⭑ Security and compatibility updates are common and necessary||⭑⭑ Can be built with minimal developer-maintained dependencies|
|Creation of new functionality||⭑ Focus is on text-based content||⭑⭑ Plugins may be created but are limited by the platform||⭑⭑⭑ Free reign to expand the app|
|Ease of deployment||⭑⭑⭑||⭑||⭑⭑|
|Text content creation flexibility||⭑⭑⭑ Content can easily be divided and expanded in whichever way is seen fit||⭑⭑⭑||⭑ Content form is limited by the structure of the web app.|
|Content creation ease of use||⭑⭑⭑ Prose-based editor is simple and distraction free||⭑⭑ The CMS must be learned before use||⭑|
|Content editing workflow||⭑⭑||⭑⭑⭑||⭑|
|Dynamic content loading||⭑||⭑||⭑⭑⭑ Progressive loading of content is simple in a web app|
|Advanced visualizations||⭑||⭑||⭑⭑⭑ Structured content is easy to work with and can be backed by an API or db|
|Ease of template editing||⭑⭑||⭑||⭑⭑⭑|
Since the site has two distinct parts with drastically different content and editing workflows and levels of interactivity, a hybrid approach is used to build it. For the list of projects, search, and the viewing of metrics, rendering is done through a React web app. We use
redux-elm which is enables Composition and is also based on an Event Sourced pattern. We're currently migrating to the Apollo Stack which enables connection to a GraphQL backend.
This allows data to easily be pulled into the site through the API either for partial page renders or full server side renders (for search engine visibility). For supporting content that explains the mission of OII, what each metric means, and methodology for collecting and interpreting data, Content As Code with a Gitlab backend will be used.