There is an old Chinese saying that goes, "to believe everything that is written in books is even worse than to have no books at all." If that phrase was coined in the 21st century, it might sound like this, ‘to believe everything in data is worse than to have no data at all’. What I’m trying to highlight with this is that the misuse and/or inappropriate use of data may actually be worse than the complete absence of data.
Let me explain myself a little better. If data is collected from only one or several aspects, the data will never be adequate, just like using data collected at a low dimension level to describe things at a high dimension level. More importantly, a greater amount of data will lead to a greater number of differences because a lot of data can be collected from one aspect to support each unique point of view, which is in conflict with others. In this case, the misuse of data is to some extent worse than absence of data. This is just one of the many data traps of big data. In this 2-part article series I will explore data-traps further as well as delve into the pitfalls of the improper use of data.
Mathematical Philosophy of Big Data
Currently, data in every shape and form is being captured and recorded. Through data emission, a big data system can record and track a user’s every move (such as clicking records, browsing times and comments). It can also record data sent from sensors such as temperature humidity, speed, pressure and the list goes on. Data shows the present world help to push the boundaries, predict the future and analyze incidences.
Regarded by some, the big data era is a total different world and the attributes and rules of anything can be transferred using appropriate code (digital medium) to other homogeneous things, where the attributes and rules are expressed without loss. In this regard, this school of thought believes that [1] big data is the equivalent of the world and they are isomorphic to each other (see Figure 1).
Figure 1 Mathematical philosophy of big data— homogeneous relationship
By quantizing everything, big data converts the whole world into data. This will probably change the way we see and understand the world, bringing us a brand new world view based on big data. So in other words, big data help us to get a bigger picture of the world.
Undoubtedly, big data is a precious resource and a powerful tool. However, it would be idealistic to deem big data homogeneous and wholly representative of the world. Big data tells us information, but instead of interpreting it big data directs people to understand it. If we misuse data, we may misunderstand it. Big data has its bright side, but it also has a dark side.
2. Plato's "Allegory of the Cave"
The famous ancient Greek philosopher Plato wrote the "Allegory of the Cave" which appeared in Chapter 7 of his Utopia [2].
The allegory asks readers to imagine that there is a very deep cave which holds prisoners who have been living in the cave since birth and who are chained so that they cannot mover their legs, arms and heads. They are chained facing a wall, and behind them is a fire. Between them and the fire is a walkway where people walk carrying objects and puppets to create shadows on the only wall that the prisoners can see. Because these shadows are the only things that the prisoners can see in the cave, they grow up believing that the shadows are in fact real. These shadows would become part of their only reality. (See Figure 2.)
Figure 2 Plato's "Allegory of the Cave" (Picture source: Wikipedia; created by: Markus Maurer)
Using the prisoners’ perceived reality as the main metaphor, Plato created this allegory to illustrate the effects one’s surroundings has on their perception of the world.
In the same light, we, limited by available measurement and learning means, can only sense one or several aspects of a certain object, just like the prisoners in Plato's allegory. Limited by the chains, they can only face the wall in front of them, which makes them believe that the shadows they sense (2-dimensional) are the true world (3-dimensional). If the shadows are converted into data, then our data may fail to show us the full picture regardless of what technology is used and how much data is collected. This is one of the biggest data-traps out there.
A world without adequate dimensions is a false world. In the same light, a fact without adequate data to back it up is a false fact. The worst case scenario here is that these may create a negative situation where each person possessing inadequate data becomes obstinate, leading to mutually exclusive data traps.
Surely more data helps you get closer to the truth, Right?
As data is collected from an increasing number of things, more and more people depend only on data when making decisions. This attitude is summed up by the quote that Edwards Deming once famously said, "In God we trust. All others (must) bring data".
Trusting in data is fine.
However there is a counter-argument to this stance. A famous Chinese saying states, ‘to believe everything in books is worse than to have no books at all’, in the context of the big data data-trap conversation roughly translates to that to believe everything data tells you is worse than to have no data at all. The misuse and/or inappropriate use of data may actually be worse than the complete absence of data.
This is because there is only one physical world, while it can be described from numerous aspects. In many cases, the data that we can collect, the data available to us, and the data we choose to trust are only data on one or several aspects of facts.
Using the data to interpret facts is just like describing a high-dimensional world at a low-dimensional level. The interpretation would not be correct regardless of the amount of data collected. It could be worse when the number of differences may increase along with the amount of data collected, because data is available to support each point of view from one aspect of a fact. Such points of view are in conflict with each other, creating an infinite loop.
For example, if we assume the fact that education quality is deteriorating, the data we get is scores of standardized exams. Then, can that data fully reveal students' potential? To what extent can such exams show students' creativity? Does education aim for scores or abilities? The reason for disputes over standardized exams is that the scores fail to show the potential of students.
Another example, if I say that Li Hongzhang is one of the 3 most outstanding Chinese diplomats (the others are Zhou Enlai and Wellington Koo) in the modern history of China. You may be met with a wave of heated protests arguing that he should be called a traitor because of his involvement in all of the 30 most unjust treaties of China in the modern history.
Everyone has a reason for different point of views. Everyone is keen to use one aspect of the fact to deny another.
Mr. Tu Zipei, a big data expert, has pointed out in his Why the Truth Gets Further Away As Data Aggregates [3] that “just like the man of Chu in the Chinese fable 'Carve on Gunwale of a Moving Boat' (about a person who took measures without regarding the changes in circumstances), human beings have access to only the facts within a limited space and time.”
Even some of the most successful big data processing companies in the world, such as Alibaba, still are not without difficulties in big data.
Mr. Tu provided an example. Before joining Alibaba, a senior manager in charge of business operations once turned to him for suggestions. By then, Alibaba had nine business departments predicting consumers' product needs and wants. The opinions of these departments were often in conflict with each other and everyone of them believed that it has the most reasonable and accurate prediction.
Mr. Tu believes that this case shows a great potential risk of the big data era. Huge amounts of data will result in the situation where "everyone has a valid reason." A person can always come to a conclusion different from others' with support of data.
According to Thomas Grump, a digital anthropologist [4], all data is collected by people, while no one is always rational!
As a result, we often see many opinion conflicts with only a few consensuses.
To some extent, this result may be worse than the result when data is unavailable. This is one of the main data traps that must be considered when using big data.
Ways to eliminate the data trap
Mr. Tu Zipei guessed that the Alibaba case resulted from the fact that the departments made conclusions based on their respective data, which was collected from different areas. Zipei’s suggestion in these situations is to consolidate departments and integrate data to form multi-dimensional data that is more close to the truth than predictions.
Zipei’s suggestion can be summed up using another Chinese idiom, "listen to both sides and you will be enlightened." Learning about facts from more aspects brings people closer together to actual reality. Otherwise, they will just be getting one aspect of a multi-dimensional issue.
Although, in this modern age of technological advancements and big data, we’ll all do well not to forget some ancient wisdom to help us falling into data traps.
References:
[1] Change By Big Data [M] by Li Dewei; published by Publishing House of Electronics Industry in October 2013
[2] Utopia [M] by Plato; translated by Huang Yin and published by The Chinese Overseas Publishing House in June 2012
[3] Why the Truth Gets Further Away As Data Aggregates by Tu Zipei; published by Logical Thinking in April 2016
[4] The Anthropology of Numbers [M] by Thomas Grump; translated by Zheng Yuanzhe and published by Central Compilation & translation Press in August 2007
Data Processing with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka
2,599 posts | 762 followers
Followdigoal - December 22, 2020
Alibaba Clouder - March 11, 2019
Data Geek - June 13, 2024
Alibaba Cloud Community - September 5, 2024
Alibaba Clouder - September 15, 2020
Lana - April 14, 2023
2,599 posts | 762 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreAlibaba Cloud equips financial services providers with professional solutions with high scalability and high availability features.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Alibaba Clouder