David Ribes

“Prospecting (in) the data sciences”

University of Washington

David Ribes explaining the infraestructuring of data sciences in the US

Data scientists need to prospect information for being able to eventually extract it and then use it within their system. Prospecting, as a concept, refers to the searching of a place, either land or data, with the final aim to extract it. The idea is to find resources in data, expertise, software and tools that are found within different type of domains.

This is critical because data scientists work with domain expertise, communication science and mathematics, but do not have data of its own. Their job is to extract data from other fields and then use the infrastructures that they create for solving specific issues or needs. Data scientists treat data as their object of study, not of a particular field, but data as data. They create domains that can be filed of real world data so that they can then be interpreted or analyzed. This is because data scientist is relevant for everything, but has nothing of its own.

Data scientists create structures by using mathematical assembles and computer science knowledge, and upload it with a specific type of knowledge, known as the domain. This means that data scientists are able to construct structures that can be describes as empty shells, formed by the mathematical and computer science, but independent to domain knowledge. This is the reason for which data scientists do not need a specific domain, they do not produce data, but need to find it, search it, prospect it.

Prospecting is the activity that normally comes before extraction, it is the searching of a place, either land or data, with the final aim to extract a resource


The Big Data Hubs and Spokes (BD Hubs) is a big data science umbrella project in the US that aims to pull together all that is going on within big data a national scale. The goal is to address gran social challenges, and in this way refer to the public’s good.

What is domain? It is a structural propensity for data science. It is any activity and/or knowledge about the world that is grouped by a common idea, principle, etc. This idea of domain is used to contrast with other key elements of data science, as knowledge about mathematics and computer science. This other elements of data science have been named differently over time, as cyber infrastructure, domain independence or domain generality or sometimes a tool’s name or just data science, but the name and concept of domain (knowledge domain) has been constant since the 1970s when symbolic artificial intelligence begun.


Go back to Workshop “EXTRACTIONS” main page

Leave a Reply

Your email address will not be published. Required fields are marked *