Friday, April 27, 2018

Data Analysis Careers

On 25 April 2018, the European Commission increased its investment in AI research to €1.5 billion for the period 2018-2020 under the Horizon 2020 research and innovation program. This investment is expected to trigger an additional €2.5 billion of funding from existing public-private partnerships, for example on big data and robotics. It will support the development of AI in key sectors, from transport to health; it will connect and strengthen AI research centers across Europe, and encourage testing and experimentation. The Commission will also support the development of an "AI-on-demand platform" that will provide access to relevant AI resources in the EU for all users.

Additionally, the European Fund for Strategic Investments will be mobilized to provide companies and start-ups with additional support to invest in AI. With the European Fund for Strategic Investments, the aim is to mobilize more than €500 million in total investments by 2020 across a range of key sectors.

With the dawn of artificial intelligence, many jobs will be created, but others will disappear and most will be transformed. This is why the Commission is encouraging Member States to modernize their education and training systems and support labour market transitions, building on the European Pillar of Social Rights.

The annus mirabilis of deep learning was 2012 when Google was able to coax millions of users into crowdsourcing labeled images. They also had tens of thousands of servers that were not very busy at night. Most of all, however, Google has an incredible PR department that was able to create a meme.

  1. Software defined storage (SDS) on commodity hardware made it very inexpensive to store large amounts of data. When the cloud is used for storage, there are no capital expenditures.
  2. Ordinary citizens became willing to contribute vast amounts of data in barter for free search, email, and SNS services. They were also willing to label their data for free, creating substantial ground truth corpora that can be used as training sets.
  3. High-frequency trading created a market for GPGPU hardware, resulting in much lower prices. Also, new workstation architectures made it possible to break the impasse caused by the end of Moore's law.
  4. ML packages on CRAN made it easy to experiment with R. Torch and Weka made it easy to write applications capable of processing very large datasets.

Many companies are setting up analytics departments and are trying to hire specialists in this field. However, there is great confusion on what the new careers are and how they are different. Often, even the companies posting the job openings do not understand the differences.

Recently, in the Sunnyvale City Hall, two representatives from LinkedIn and a representative each from UCSC Silicon Valley Extension and California Science and Technology University, participated in a panel organized by NOVA, dispelling the confusion.

Essentially there are three professions: data analyst, data engineer, and data scientist:

  • Data analysts tends to be more entry level and do not necessarily need programming or domain knowledge: they visualize data, organize information and summarize data, often using SQL. Essentially, they deal with data "as is."
  • Data engineers do what is called data preparation, data wrangling, or data munging. They pull data from multiple, distributed (and often unstructured) data sources and get it ready for data scientists to interpret. They need a computer science background and should be skilled with programming, Hadoop, MapReduce, MySQL, and Spark.
  • Data scientists turn the munged data into actionable insights, after they have made sure the data is analytically rigorous and repeatable. They usually have a Ph.D. The ability to communicate is vital! They must have a core understanding of the business, be able to show why the data matters and how it can advance business goals and communicate this to business partners. They need to convince decision makers, usually at the executive level.
data analysis careers

Monday, March 26, 2018

Stanford Workshop on Medical VR and AR

5 April 2018, there will be a public workshop on medical head-mounted displays in Stanford. The workshop is designed to support collaborations between the engineers who are developing VR and AR technologies and the surgeons and clinicians who are using these technologies to treat their patients.

The workshop features talks by researchers who are developing VR and AR technologies to advance healthcare and panel discussions with Stanford physicians who are using VR and AR applications for surgical planning and navigation and for alleviating pain and anxiety in their patients.

There will be an interactive demo session featuring research projects, clinical applications, and startup ventures.

Seating is limited, so if you wish to attend, we recommend that you register now at the website

Stanford Workshop on Medical VR and AR

Monday, February 12, 2018

Claudio Oleari

On 23 January 2018, Claudio Oleari passed away at the age of 73 in Reggio Emilia. He was the last and ultimate authority on the OSA-UCS color space and perceptually uniform color.

He was an eminent physics scholar and an associate professor at the University of Parma, at the Department of Physics and Earth Sciences. He devoted his life to the activities of teaching with the same passion and interest that he dedicated to research in the context of color, applying physics to perception and establishing its role in colorimetry. In 1995 he started the Gruppo in Colorimetria e Reflectoscopia, which later became the Associazione Italiana Colore.

His availability for colleagues and students and his ability to listen and advise are proverbial: his kindness will always be remembered by everyone who has known him. These qualities are exemplified by the message on his profile at the University of Parma “You are welcome any day and at any time, even without an appointment. It is useful to verify by telephone my presence in the office. To book a meeting and ask questions, sent an email to”

He initiated, within the Italiana Association, many valuable informational activities and forged many connections which persist as a rich bibliography, always having in mind the need to invest in research and training both in Italy and abroad.” He initiated, within the Italiana Association, many valuable informational activities and forged many connections which remain as a rich bibliography, always having in mind the need to invest in research and training both in Italy and abroad.

His death leaves a void difficult to fill, and the world of color loses an intellectual and an attentive and informed scholar.

Claudio Oleari

Thursday, January 25, 2018

Perceptual Similarity Sorting Experiment

If you have an extra ~5 minutes please try out our online perceptual similarity sorting experiment.

This is follow-up to the work that Michael Ludwig (one of our summer interns from last summer) conducted and is continuing to work on as part of his PhD research.

For more details, please see this about page for the experiment. Thank you.

Friday, January 12, 2018

Annotating detected outliers

The so-called Twitter Anomaly Detection function for R is excellent but also very minimalistic. The input is a two-column data frame where the first column consists of the timestamps and the second column contains the observations. In addition to a plot, the output is a data frame comprising timestamps, values, and optionally, expected values.

In practice, we usually have some semantic information that we would also like to include in the output, so we do not have to refer back to the original data. Fortunately, there is a quick-and-dirty way to add a description to the outlier data frame.

We start with the annotated data frame containing at least columns with the timestamps, the observations, and factors providing contextual or semantic information on each observation. We then create a simple data frame with just the first two columns, which we pass to the outlier detection function.

We can write a trivial function that for each outlier finds the row index in the simple data frame and looks up the semantic information in the annotated data frame:

AddDescription <- function(series1, series2, outliers) {
 quantity <-  lengths(outliers$anoms[1])
 if (quantity < 1) return (NULL)
 else {
   result <- NULL
  for (i in 1:quantity) {
   rowIndex <- which(series1$timestamp == outliers$anoms$timestamp[i])
   newRow <- data.frame(outliers$anoms$timestamp[i],
   result <- rbind(result, newRow)
  colnames (result) <- c("timestamp", "outlier_value", "description")
  return (result)

This function is just an elementary example. It is easy to add to each outlier more detailed information you can compile from the full data frame.

Time series with outliers at green markers

outliers with descriptions
  timestamp outlier_value description
2017-01-17 06:53:00
gear display flashing
2017-09-19 09:10:00
gear shift failure
2017-11-17 07:26:00
check engine lamp on

Dates are a sore point of analytics: they alway get you. When no time zone is specified, i.e., tz = "", R assumes the local time zone. In the data frame returned by Twitter's AnomalyDetectionTs functions, the time column has UTC as the time zone. Therefore, the following statement is useful after the call to AnomalyDetectionTs:

anomalies$anoms$timestamp <- as.POSIXct(anomalies$anoms$timestamp, tz = "")