Organic Data Science Framework


Over the last hundred years, science has become an increasingly collaborative endeavor. Scientific collaborations, sometimes referred to as “collaboratories” and “virtual organizations”, range from those that work closely together and others that are more loosely coordinated. Some scientific collaborations revolve around sharing instruments (e.g., the Large Hadron Collider), others focus on a shared database (e.g., the Sloan Sky Digital Survey), others form around a shared software base (e.g., SciPy), and others around a shared scientific question (e.g., the Human Genome Project).

Our work focuses on scientific collaborations that are driven by a shared scientific question that requires the integration of ideas, models, software, data, and other resources from different disciplines. These projects are particularly challenging because they require:

  • significant organization and coordination, as people with diverse backgrounds are supposed to first discover one another and then find common ground to collaborate
  • retaining users over the long term, since people need clear incentives to remain involved for the long period of time that such projects are active
  • incrementally growing the community with unanticipated participants, as they bring in skills or resources needed as the project is fleshed out

For all these reasons, even though such scientific collaborations do occur they are not very common. Yet, they are needed in order to address major engineering and science challenges in our future.

This project is developing the Organic Data Science Framework (ODSF) to support scientific collaborations that revolve around complex science questions that require significant coordination to synthesize multi-disciplinary findings, enticing contributors to remain engaged for extended periods of time, and continuous growth to accommodate new contributors as needed as the work evolves over time.

ODSF addresses these challenges with a collaborative user interface that supports:

  1. self-organization of the community through user-driven dynamic task decomposition,
  2. on-line community support by incorporating social design principles and best practices,
  3. an open science process by capturing new kinds of metadata about the collaboration that provide invaluable context to newcomers.

With ODSF, users formulate science tasks to describe the what, who, when, and how of the smaller activities pursued within the collaboration. The interface is designed to entice contributors to participate and continue involved in the specific tasks they are interested in. The framework is in its early stages of development, and it evolves to accommodate user feedback and to incorporate new collaboration features.

For more information, please visit these sites:

Publications

  • "Organic Data Science: A Task-Centered Interface to On-Line Collaboration in Science." Yolanda Gil, Felix Michel, Varun Ratnakar, and Matheus Hauder. Proceedings of the ACM International Conference on Intelligent User Interfaces, Atlanta, GA, 2015.
    Download
  • "A Task-Centered Framework for Computationally-Grounded Science Collaborations." Yolanda Gil, Felix Michel, Varun Ratnakar, Matheus Hauder, Christopher Duffy, and Paul Hanson. Proceedings of the Eleventh IEEE International Conference on eScience, Munich, Germany, 2015.
  • "A Virtual Crowdsourcing Community for Open Collaboration in Science Processes.” Felix Michel, Yolanda Gil, and Matheus Hauder. Proceedings of the Americas Conference on Information Systems (AMCIS), August 2015.
  • "Supporting Open Collaboration in Science through Explicit and Linked Semantic Description of Processes." Yolanda Gil, Felix Michel, Varun Ratnakar, Jordan Read, Matheus Hauder, Christopher Duffy, Paul Hanson, and Hilary Dugan. Proceedings of the Twelfth European Semantic Web Conference (ESWC), Portoroz, Slovenia, 2015.
    Download
  • "A Semantic, Task-Centered Collaborative Framework for Science." Yolanda Gil, Felix Michel, Varun Ratnakar, and Matheus Hauder. Proceedings of the Twelfth European Semantic Web Conference (ESWC), Portoroz, Slovenia, 2015. Slides
    Download
  • "Organic Data Sharing: A Novel Approach to Scientific Data Sharing." Yolanda Gil, Varun Ratnakar, and Paul Hanson. Second International Workshop on Linked Science: Tackling Big Data (LISC), held in conjunction with the International Semantic Web Conference (ISWC), Boston, MA, 2012.
    Download

The Organic Data Science Framework software is open source and is released on GitHub under an Apache 2.0 license (https://github.com/IKCAP/organicdatascience).

The Organic Data Science Framework is built on top of MediaWiki and Semantic MediaWiki.

We also use the Page Object Model (POM) extension of MediaWiki, which supports the manipulation of the content of the wiki pages. These three existing components, which provide underlying infrastructure, are shown in dark grey at the bottom of the figure. The rest of the components in the figure are the extensions that comprise the Organic Data Science Framework.

We developed an extension to assert and retrieve assertions in the wiki, which is the Facts API. This enables easy access to the semantic properties regardless of how specific properties are handled in Semantic MediaWiki.

The Provenance extension handles attribution for each assertion in the system. Each semantic property is annotated according to the user that asserted it. This provenance information can be queried to generate the credit shown in the different pages.

The Completion API extension enables the system to offer users completions of the properties as they are typing, based on the properties that already exist in the system. This encourages users to adopt properties that others have already created, fostering agreement and normalization of property names. The Task API extension is customized to handle information about tasks. It manages the task-subtask tree, generates the status icons, and tracks task deadlines to generate user alerts.

Finally, the Category Handling extension manages the generation of different pages that are displayed to the user, depending on the category of the page.

The Organic Data Science framework can interact with external systems through the use of Semantic Web representations. External systems that we plan to integrate into the Organic Data Science framework include workflow systems, data repositories, software repositories, collaboration networks, and publication repositories.


To cite this project, please use:

"A Task-Centered Framework for Computationally-Grounded Science Collaborations." Yolanda Gil, Felix Michel, Varun Ratnakar, Matheus Hauder, Christopher Duffy, and Paul Hanson. Proceedings of the Eleventh IEEE International Conference on eScience, Munich, Germany, 2015.


This work is supported by the National Science Foundation through the INSPIRE program with grant number IIS-1344272.