Monday, April 18, 2016

Data Shamans

The benefit of attending meet-ups and conferences is that, compared to papers and webinars, you can hear and understand questions and you can talk to the speakers and other audience members during the breaks. Especially in conferences, in the formal presentation, you hear a scientific report, while in the breaks you can learn all the false turns the researchers have taken in their endeavors but have no place in short scientific communications.

As I mentioned last October regarding the ACM Data Science Camp Silicon Valley, the field of advanced analytics is full of hype. Data scientists are perceived like demigods, but in reality, their employment can be insecure and harsh.

Indeed, I often hear from data scientists that they are treated like shamans, i.e., a person regarded as having access to, and influence in, the world of benevolent and malevolent spirits, who typically enters into a trance state during a ritual, and practices divination and healing.

When the organization has a problem it cannot solve and the scientists or engineers are at the end of their wit, they collect big data and deposed it a the feet of their data scientists in the hope to get a miracle by the next day. A problem can only be solved when the causality is known, and correlation does not imply causality. There is no magic algorithm the data scientists can throw at the data and solve the engineering riddle.

In the end, the data scientists have to be able to go back to first principles. However, their training and experience make them more diffident to project preconceptions into the data, and their toolbox allows them to formulate hypotheses and test them statistically more efficiently. There are no data shamans.

Not a data shaman