Friday, April 27, 2018

Data Analysis Careers

On 25 April 2018, the European Commission increased its investment in AI research to €1.5 billion for the period 2018-2020 under the Horizon 2020 research and innovation program. This investment is expected to trigger an additional €2.5 billion of funding from existing public-private partnerships, for example on big data and robotics. It will support the development of AI in key sectors, from transport to health; it will connect and strengthen AI research centers across Europe, and encourage testing and experimentation. The Commission will also support the development of an "AI-on-demand platform" that will provide access to relevant AI resources in the EU for all users.

Additionally, the European Fund for Strategic Investments will be mobilized to provide companies and start-ups with additional support to invest in AI. With the European Fund for Strategic Investments, the aim is to mobilize more than €500 million in total investments by 2020 across a range of key sectors.

With the dawn of artificial intelligence, many jobs will be created, but others will disappear and most will be transformed. This is why the Commission is encouraging Member States to modernize their education and training systems and support labour market transitions, building on the European Pillar of Social Rights.

The annus mirabilis of deep learning was 2012 when Google was able to coax millions of users into crowdsourcing labeled images. They also had tens of thousands of servers that were not very busy at night. Most of all, however, Google has an incredible PR department that was able to create a meme.

  1. Software defined storage (SDS) on commodity hardware made it very inexpensive to store large amounts of data. When the cloud is used for storage, there are no capital expenditures.
  2. Ordinary citizens became willing to contribute vast amounts of data in barter for free search, email, and SNS services. They were also willing to label their data for free, creating substantial ground truth corpora that can be used as training sets.
  3. High-frequency trading created a market for GPGPU hardware, resulting in much lower prices. Also, new workstation architectures made it possible to break the impasse caused by the end of Moore's law.
  4. ML packages on CRAN made it easy to experiment with R. Torch and Weka made it easy to write applications capable of processing very large datasets.

Many companies are setting up analytics departments and are trying to hire specialists in this field. However, there is great confusion on what the new careers are and how they are different. Often, even the companies posting the job openings do not understand the differences.

Recently, in the Sunnyvale City Hall, two representatives from LinkedIn and a representative each from UCSC Silicon Valley Extension and California Science and Technology University, participated in a panel organized by NOVA, dispelling the confusion.

Essentially there are three professions: data analyst, data engineer, and data scientist:

  • Data analysts tends to be more entry level and do not necessarily need programming or domain knowledge: they visualize data, organize information and summarize data, often using SQL. Essentially, they deal with data "as is."
  • Data engineers do what is called data preparation, data wrangling, or data munging. They pull data from multiple, distributed (and often unstructured) data sources and get it ready for data scientists to interpret. They need a computer science background and should be skilled with programming, Hadoop, MapReduce, MySQL, and Spark.
  • Data scientists turn the munged data into actionable insights, after they have made sure the data is analytically rigorous and repeatable. They usually have a Ph.D. The ability to communicate is vital! They must have a core understanding of the business, be able to show why the data matters and how it can advance business goals and communicate this to business partners. They need to convince decision makers, usually at the executive level.
data analysis careers