Javier D. Fernández – Green Big Data

I have a MSc and a PhD degree in Computer Science, and it’s sad (but honest) to say that in all my academic and professional career the word “privacy” was hardly mentioned. We do learn about “security” but as a mere non-functional requirement, as it is called. Don’t get me wrong, I do care about privacy and I envision a future where “ethical systems” are the rule and no longer the exception, but when people suggest, promote or ask for privacy-by-design systems, one should also understand that we engineers (at least my generation) are mostly not yet privacy-by-design educated.

That’s why, caring about privacy, I like it so much to read diverse theories and manifestos providing general principles to come up with ethical, responsible and sustainable designs for our systems, in particular where personal Big Data (and all the variants, i.e. Data Science) is involved. The Copenhague letter (promoting open humanity-centered designs to serve society), the Responsible Data Science principles (fairness, accuracy, confidentiality, and transparency) and the Ethical Design Manifesto (focused on maximizing human rights and human experience and respect human effort) are good examples, to name but a few.

Acknowledging that these are inspiring works, an engineer might find the aforementioned principles a bit too general to serve as an everyday reference guide for practitioners. In fact, one could argue that they are deliberately open for interpretation, in order to adapt them to each particular use case: they point to the goal(s) and some intermediate stones (i.e. openess or decentralization), while the work of filling up all the gaps is by no means trivial.

Digging a bit to find more fine-grained principles, I thought of the concept of Green Big Data, to refer to Big Data made and use in a “green”, healthy fashion, i.e, being human-centered, ethical, sustainable and valuable for the society. Interestingly, the closest reference for such term was a highly cited article from 2003 regarding “green engineering” [1]. In this article, Anastas and Zimmerman inspected 12 principles to serve as a “framework for scientists and engineers to engage in when designing new materials, products, processes, and systems that are benign to human health and the environment”.

Inspired by the 12 principles of green engineering, I started an exercise to map such principles to my idea of Green Big Data. This map is by no means complete, and still subject to interpretation and discussion. Ben Wagner and my colleagues at the Privacy & Sustainable Computing Lab provided valuable feedback and encouraged me to share these principles with the community in order to start a discussion openly and widely. As an example, Axel Polleres already pointed out that “green” is interpreted here as mostly covering the privacy-aware aspect of sustainable computing, but other concepts such as “transparency-aware” (make data easy to consume) or “environmentally-aware” (avoid wasting energy by letting people run the same stuff over and over again) could be further developed.

You can find the Green Big Data principles below, looking forward for your thoughts!

12 Principles of Green Engineering

12 Principles of Green Big Data

Related topics

Principle 1

Designers need to strive to ensure that all material and energy inputs and outputs are as inherently non-hazardous as possible.

Big Data inputs, outputs and algorithms should be designed to minimize exposing persons to risk.

Security, privacy, data leaks, fairness, confidentiality, human-centric

Principle 2

It is better to prevent waste than to treat or clean up waste after it is formed.

Design proactive strategies to minimize, prevent, detect and contain personal data leaks and misuse.

Security, privacy, accountability, transparency

Principle 3

Separation and purification operations should be designed to minimize energy consumption and materials use.

Design distributed and energy-efficient systems and algorithms that require as little personal data as possible, favoring anonymous and personal-independent processing.

Distribution, anonymity, sustainability

Principle 4

Products, processes, and systems should be designed to maximize mass, energy, space, and time efficiency.

Use the full capabilities of existing resources and monitor that it serves the needs of individuals and the society in general.

Sustainability, human-centric, societal challenges, accuracy

Principle 5

Products, processes, and systems should be “output pulled” rather than “input pushed” through the use of energy and materials.

Design systems and algorithms to be versatile, flexible and extensible, independently of the scale of the personal data input.

Sustainability,

scalability

Principle 6

Embedded entropy and complexity must be viewed as an investment when making design choices on recycle, reuse, or beneficial disposition.

Treat personal data as a first-class but hazardous citizen, with extreme precautions in third-party personal data reuse, sharing and disposal.

Privacy, confidentiality, human-centric

Principle 7

Targeted durability, not immortality, should be a design goal.

Define the “intended lifespan” of the system, algorithms and involved data, and design them to be transparent by subjects, who control their data.

Transparency, openness, right to amend and to be forgotten,

human-centric

Principle 8

Design for unnecessary capacity or capability (e.g., “one size fits all”) solutions should be considered a design flaw.

Analyze the expected system/algorithm load and design it to meet the needs and minimize the excess.

Sustainability, scalability, data leaks

Principle 9

Material diversity in multicomponent products should be minimized to promote disassembly and value retention.

Data and system integration must be carefully designed to avoid further personal data risks.

Integration, confidentiality, cross-correlation of personal data

Principle 10

Design of products, processes, and systems must include integration and interconnectivity with available energy and materials flows.

Design open and interoperable systems to leverage the full potential of existing systems and data, while maximizing transparency for data subjects.

Integration, openness

Interoperability, transparency

Principle 11

Products, processes, and systems should be designed for performance in a commercial “afterlife”.

Design modularly for the potential system and data obsolescence, maximizing reuse.

Sustainability, Obsolescence

Principle 12

Material and energy inputs should be renewable rather than depleting.

Prefer data, systems and algorithms that are

open, well-maintained and sustainable in the long term.

Integration, openness

Interoperability, sustainability

 

[1] Anastas, P. & Zimmerman, J. 2003. Design through the 12 principles of green engineering. Environmental Science and Technology 37(5):94A–101A

Posted in Uncategorized.