Friday, September 30, 2016

Navigating instead of searching

I believe it was on a day at the end of March or beginning of April 1996, when out of the blue I received the assignment to write a report on why the world wide web would be important for my employer at the time. I was given only two weeks to write a blurb and a slide deck. I had not thought about this matter and it was a struggle to write something meaningful in such a short time, without any time to read up. It ended up as an exercise in automatic writing: just write ahead and never look back and revise.

At the end, I delivered the blurb, but management decided the web was just a short-lived fad like citizen band (CB) radio that would go away shortly and put the blurb in the technical report queue for external publication. I did not think much about it because such is life in a research lab. However, with all the hype of the time on the "Internet Tsunami" by Bill Gates, the "Dot in the Dot Com" by Scott McNealy, and all the others—while my employer remained silent and kept everybody curious—the report W3 + Structure = Knowledge was requested hundreds of times (at that time a tech report was typically requested much less than ten times). Subsequently, I received quite a few requests to present the companion slide deck.

In my struggle to write something in two weeks, I typed about the need to structure information on the world wide web so it can be easily navigated, instead of searching for information.

little dancer leaping over the world wide web

It appears today we are at such a disruptive juncture again. This time, it is not about websites: it is about data (some prefer to call it big data, but size does not really matter). Solid state drives are now inexpensive, can hold 60 TB of data in each 3.5" drive, and have access times similar to RAM. In addition, we have all the shared data in the various clouds.

Today, we are not interested in finding data, we are maybe wiling to navigate to data, but preferably we would like the data to anticipate our need for it and come to us in digested, actionable form. Actually, we are not interested in the data, we are interested in data in a context: knowledge. We want the data to come to us and ask us if it is OK to take an anticipated action based on a compiled body of knowledge: wisdom.

This is an emergent property because all the pieces have fallen together. Our mobile devices and the internet of things constantly gather data. There are open source implementations of deep learning algorithms. CUDA 8 lets us run them on inexpensive Pascal GPGPUs like the Tesla P100. Algebraic topology analytics lets us build networks that compile knowledge about the data. Digital assistant technology brings this knowledge at our service.

Another key ingredient is the skilled workforce. Mostly Google, but also Facebook, Apple, Amazon, etc. have been aggressively educating their workforce in advanced analytics, and as brains move from company to company in the Silicon Valley, these skills are diffused in the industry.

In a recent New York Times article, G.E.'s Jeffrey R. Immelt explained how he is taking advantage of this talent pool in a new Silicon Valley R&D facility employing 1,400 people. Microsoft is creating a new group, the AI and Research Group, by combining the existing Microsoft Research group with the Bing and Cortana product groups, along with the teams working on ambient computing. Together, the new AI and Research Group will have some 5,000 engineers and computer scientists.

This is the end of search engines. This is the end of metadata: we want wisdom based on all the data.

Here is a revised version of the post I wrote in July.

The asumption is that to be useful, technology has to enable society to become more efficient so life quality increases. The increase has to be at least one order of magnitude.

Structured data

In the context of big data, we read a lot about structured versus unstructured data. So far, so good. Things get a little murky and confusing when advanced analytics—which refers to analytics for big data—joins the conversation. The confusion comes from the subtle difference between "structured data" and "structure of data," which contain almost the same words (their edit distance is 3). Both concepts are key to advanced analytics, so they often come up together. In this post, we will try to shed some light on this murkiness to clarify it.

The categorization in structured, semi-structured, and unstructured data comes from the storage industry. Computers are good at chewing on large amounts of data of the same kind, like for example the readings from a meter or sensor, or the transactions on cash registers. The data is structured in the sense that each record has the same fields at the same locations, for example on an 80 or 96 column punched card, if you want a visual image. This structure is described in a schema.

Databases are optimized for storing structured data. Since each record has the same structure, the location of the i-th record on the disk is i times the record length. Therefore, it is not necessary to have a file system: a simple block storage system is all that is needed. When instead of the i-th record we need the record containing a given value in a given field, we have to scan the entire database. When this is a frequent operation in a batch step, we can accelerate it by first sorting the records by the values in this field, which allows us to use binary search, which is logarithmic instead of linear.

Because an important performance metric is the number of transactions per second, database management systems use auxiliary structures like index files and optimizing query systems like SQL. In a server-based system, when we have a query, we do not want to transfer the database record by record to the client: this leads to server-based queries. When there are many clients, often the same query is issued from various clients, therefore, caching is an important mechanism to optimize the number of transactions per second.

Database management systems are very good at dealing with transactions on structured data. There are many optimization points that allow for huge performance gains, but it is a difficult art requiring highly specialized analysts.

Semi-structured data

With cloud computing, it has become very easy to quickly deploy a consumer application. The differentiation is no longer by the optimization of the database, but in being able to collect and aggregate user data so it can be sold. This process is known as monetization and an example is click-streams. The data is to a large extent in the form of logs, but their structure is often unknown. One reason is that the schemata often change without a notification because the monetizers infer them by reverse engineering. Since the data is structured with an unknown schema, it is called semi-structured. With the Internet of Things (IoT), also known as Web of Things, Industrial Internet, etc., a massive source of semi-structured data is coming towards us.

This semi-structured data is high-volume and high-velocity. This breaks traditional relational databases because data parsing and schema inference become a performance bottleneck. Also, the indexing facilities may not be able to cope with the data volume. Finally, the traditional database vendor's pricing models do not work for high volumes of less costly data. The paradigms for semi-structured data are column based storage and NoSQL (not only SQL).

In big data scenarios, structured data can have high-volume and high-velocity. Although it may be fully structured, e.g., rows of double precision floating point values from a set of sensors (a time series), a commercial database system might lose data when reindexing. Even a NoSQL database might be too slow. In this case, this structured data is treated as unstructured and each column is stored in a separate file for concurrent writes. Typically, the content of such a file is a time series.

Unstructured data

The ubiquity of smartphones with their photo and video capabilities and connectedness to the cloud has brought a flood of large data files. For example, when the consumer insurance industry thought it can streamline its operations by having insured customers upload images of damages instead of keeping a large number of claim adjusters in the field, they got flooded with images. While an adjuster knows how to document a damage with a few photographs, consumers take dozens of images because they do not know what is essential.

Photographs and videos have a variety of image dimensions, resolutions, compression factors, and duration. The file sizes vary from a few dozen kilobytes to gigabytes. They cannot be stored in a database other than as a blob, for binary large object: the multimedia item is stored as a file or an object and the database just contains a file pathname or the address of an object. In general, not just images and video are stored in blobs, therefore we use the more generic term of digital items. Examples of digital items in engineering applications are drawings, simulations, and documentation. In their 2006 paper, Jim Gray et al. found that databases can store efficiently digital items of up to 256 KB [1].

For digital items, in juxtaposition to conventional structured data, the storage industry talks about unstructured data.

Unstructured data can be stored and retrieved, but there is nothing else that can be done with it when we just look at it as a blob. When we look at analytics jobs, we see that analysts spend most of their time munging and wrangling data. This task is nothing else than structuring data because analytics is applied to structured data.

In the case of time series data, the wrangling is easy, as long as the columns have the same length. If not, a timestamp is needed to align the time series elements and introduce NA values where data is missing. An example of misaligned data is when data from various sources is blended.

In the case of semi-structured data, wrangling entails reverse engineering the schema, convert dates between formats, distinguish numbers and strings from factors, and dealing correctly with missing data. In the case of unstructured data, it is about extracting the metadata by parsing the file. This can be a number of tags like the color space, or it can be a more complex data structure like the EXIF, IPTC, or XMP metadata.

Structure in time series

A time series is just a bunch of data points, so at first one might think there is no structure. In a way, statistics, with its aim to summarize data, can describe the structure in raw data. It can infer its distribution and its parameters, model it through regression, etc. These summary statistics are the emergent metadata of the time series.

Structure in images

A pictorial image is usually compressed with the JPEG method and stored in a JFIF file. The metadata in a JPEG image consists of segments beginning with a marker, the kind of the marker, and if there is a payload, the length and the payload itself. An example of a marker kind is the type (baseline or progressive) followed by width, height, number of components, and their subsampling. Other markers are the Huffman table (HT), the quantization tables (DQT), a comment, and application-specific markers like the color space, color gamut, etc. This illustrates that unstructured data contains a lot of structure. Once the data wrangler has extracted and munged this data, it is usually stored in R frames, or in a dedicated HIVE or MySQL database. These allow processing with analytics software.

Deeper structure in images

Analytics is about finding even deeper structure in the data. For example, a JPEG image is first partitioned in 8×8 pixel blocks, which are each subjected to a DCT. Pictorially, the cosine basis (the kernels) looks like in this figure:

the kernels of the discrete cosinus transform (DCT)

The DCT transforms the data into the frequency domain, similar to the discrete Fourier transform, but in the real domain. We do this to decorrelate the data. In each of the 64 dimensions, we determine the number of bits necessary to express the values without perceptual loss, i.e., in dependence of the MTF of the combined HVS and capture & viewing devices. These numbers of bits are what is stored in the discrete quantization table DQT, and we zero out the lower order bits, i.e., we quantize the values. At this point, we have not reduced the storage size of the image, but we have introduced many zeros. Now we can analyze statistically the bit patterns in the image representation and determine the optimal Huffman table, which is stored with the HT marker, and we compress the bits, reducing the storage size of the image through entropy coding.

Like we determine the optimal HT, we can also study the variation in the DCT-transformed image and optimize the DQT. Once we have implemented this code, we can use it for analytics. We can compute the energy in each of the 64 dimensions of the transformed image. As a proxy for energy, we can compute the variance and obtain a histogram with 64 abscissa points. The shape of ordinates gives us an indication of the content of the image. For example, the histogram will tell us, if an image is more likely scanned text or a landscape.

We have built a rudimentary classifier, which gives us a more detailed structural view of the image.

Let us recapitulate: we transform the image to a space that gives a representation with a better decorrelation (like transforming from RGB to CIELAB, then from Euclidean space to cosine space). Then we perform a quantization of the values and study the histogram of the energy in the 64 dimensions. We start with a number of known images and obtain a number of histogram shapes: this is the training phase. Then we can take a new image and estimate its class by looking at its DCT histogram: we have built a rudimentary classifier.

An intuition for deep learning

We have used the DCT. To generalize, we can build a pipeline of different transformations followed by quantizations. In the training phase, we determine how well the final histogram classifies the input and propagate back the result by adjusting the quantization table, i.e., to fine-tune the weights making up the table elements. In essence, this is an intuition for what happens in deep learning, or more formally, in CNN.

In the case of image classification and object recognition, the CNN equivalent of a JPEG block is called receptive field and the generalization of the DCT DQT is called CNN filter or CNN kernel. The CNN kernel elements are the weights, just as they are the coordinates in cosine space in the case of JPEG (this is where the analogy ends: while in JPEG the kernels are the basis elements of the cosine space, in CNN the kernels are the weights and the basis in unknown). The operation of applying each filter in parallel to each receptive field is a convolution. The result of the convolution is called the feature map. While in JPEG the kernels are the discrete cosine basis elements, in CNN the filters can be arbitrary feature identifiers. While in JPEG the kernels are applied to each block, in CNN they “slide” by a number of pixels called the stride (in JPEG this would be the same as the block size, in CNN it is always smaller than the receptive field).

The feature map is the input for a new filter set and the process is iterated. However, to obtain a faster convergence, after the convolution it is necessary to introduce a nonlinearity, in a so-called ReLU layer. Further, the feature map is down-sampled, using what in CNN is called a pooling layer: this helps to avoid to overfit the training set. At the end, we have the fully connected layer, where for each class being considered, there is the probability for the initial image to belong to that class:

CC Aphex34 (Wikimedia Commons)

The filters are determined by training. We start with a relatively large set of images for whom we have determined manually the probabilities to belong to the various classes: this is the ground truth and the process is called groundtruthing. The filters can be seeded with random patterns (in reality we use the filters from a similar classification problem: transfer learning or pre-training, see the figure below) and we apply the CNN algorithm to the ground truth images. At the end, we look at the computed connected layer and compare it with the ground truth obtaining the loss function. Now we propagate the errors back to reduce the weights in the filters that caused the largest losses. The process of forward pass, loss function back-propagation, and weight update is called an epoch. The training phase is applied over a large ground truth over many epochs.

Evolution of pre-trained conv-1 filters with time; after 5k, 15k, and 305k iterations, from [2]:

Evolution of pre-trained conv-1 filters with time; after 5k, 15k, and 305k iterations, from Agrawal et al.

CNN and all the following methods are only possible because of the recent progress in GPGPUs.

A disadvantage of machine learning is that the training phase takes a long time and if we change the kind of input we have to retrain the algorithm. For some applications, you can use your customers as free workers, like for OCR you can use the training set as captcha, which your customers will classify for free. For scientific and engineering applications, you typically do not have the required millions of free workers. This is the motivation for unsupervised machine learning.

Assimilation and accommodation

So far, we took a random walk from unstructured data to multimedia files, JPEG compression, a DCT-inspired classifier, and deep learning. We saw that the crux of supervised machine learning is the training.

There are two reasons for needing classifiers. We can design more precise algorithms if we can specialize them for a certain data class. For humans, the reason is that our immediate memory can hold only 7 ± 2 chunks of information. This means that we aim to break down information into categories each holding 7 ± 2 chunks. There is no way humans can interpret the graphical representation of graphs with billions of nodes.

As already Immanuel Kant noted, categories are not natural or genetic entities, they are purely the product of acquired knowledge. One of the functions of the school system is to create a common cultural background, so people learn to categorize according to similar rules and understand each other's classifications. For example, in the biology class, we learn to organize botany according to the 1735 Systema Naturæ compiled by Carl Linnæus.

As we know from Jean Piaget's epistemological studies with children, there is assimilation when a child responds to a new event in a way that is consistent with an existing classification schema. There is accommodation when a child either modifies an existing schema or forms an entirely new schema to deal with a new object or event. Piaget conceived intellectual development as an upward expanding spiral in which children must constantly reconstruct the ideas formed at earlier levels with new, higher order concepts acquired at the next level.

The data scientist's social role is to further expand this spiral. Concomitantly, data scientists have to be well aware of the cultural dependencies of acquired knowledge.

For data, this means that we want to cluster it (recoding by categorization). Further, we want to connect the clusters in a graph so we can understand its structure (finding patterns). At first, clustering looks easy: we take the training set and do a Delaunay triangulation, the dual graph of the Voronoi diagram. After building the graph with the training set, for a new data point, we just look in which triangle it falls and know its category. Color scientists are familiar with Delaunay triangulations because they are used for device modeling by table lookup. Engineers use them to build meshes for finite element methods.

The problem is that the data is statistical. There is no clear-cut triangulation and points from one category can lie in a nearby category with a certain probability. Roughly, we build clusters by taking neighborhoods around the points and the intersect them to form the clusters. The crux is to know what radius to pick for the neighborhoods because the outcome will be very different.

Algebraic topology analytics

This is where the relatively new field of algebraic topology analytics comes into play [4]. It has only been about 15 years that topology has started looking at point clouds. Topology, an idea of the Swiss mathematician Leonhard Euler, studies the properties of shape independent of coordinate systems, dependent only on a metric. The topological properties are deformation invariant (a donut is topologically equivalent to a mug). Finally, topology constructs compressed representations of shape.

The interesting element of shape in point clouds are the k-th Betti numbers βk, the number of k-dimensional "holes" in a simplicial complex. For example, informally β0 is the number of connected components, β1 the number of roundish holes, and β2 the number of cavities.

Algebraic topology analytics relieves the data scientist from having to guess the correct radius of the point neighborhoods by considering all radii and retaining only those that change the topology. If you want to visualize this idea, you can think of a dendrogram. You start with all the points and represent them as leaves; as the radii increase, you walk up the hierarchy in the dendrogram.

This solves the issue of having to guess a good radius to form the clusters, but you still have the crux of having to find the most suitable distance metric for your data set. This framework is not a dumb black-box: you still need the skills and experience of a data scientist.

The dendrogram is not sufficiently powerful to describe the shape of point clouds. The better tool is the set of k-dimensional persistence barcodes that show the Betti numbers in function of the neighborhood radii for building the simplicial complexes. Here is an example from page 347 in Carlsson's article [4]:

(a) Zero-dimensional, (b) one-dimensional, and (c) two-dimensional persistence barcodes


With large data sets, when we have a graph, we do not necessarily have something we can look at because there is too much information. Often we have small patterns or motifs and we want to study how a higher order graph is captured by a motif [5]. This is also a clustering framework.

For example, we can look at the Stanford web graph at some time in 2002 when there were 281,903 nodes (pages) and 2,312,497 edges (links).

Clusters in the Stanford web graph

We want to find the core group of nodes with many incoming links and the tied together periphery groups that are tied together and also up-link to the core.

A motif that works well for social network kind of data is that of three interlinked nodes. Here are the motifs with three nodes and three edges:

Motifs for social networks

In motif M7 we marked the top node in red to match the figure of the Stanford web.

Conceptually, given a higher order graph and a motif Mi, the framework searches for a cluster of nodes S with two goals:

  1. the nodes in S should participate in many instances of Mi
  2. the set S should avoid cutting instances of Mi, which occurs when only a subset of the nodes from a motif are in the set S

The mathematical basis for this framework are motif adjacency matrices and the motif Laplacian. With these tools, a conductance metric in spectral graph theory can be defined, which is minimized to find S. The paper [5] in the references below contains several worked through examples for those who want to understand the framework.


  1. R. Sears, C. van Ingen, and J. Gray. To BLOB or not to BLOB: Large object storage in a database or a filesystem? Technical Report MSR-TR-2006-45, Microsoft Research, One Microsoft Way, Redmond, WA 98052, April 2006.
  2. P. Agrawal, R. Girshick, and J. Malik. Analyzing the Performance of Multilayer Neural Networks for Object Recognition, pages 329–344. Springer International Publishing, Cham, September 2014.
  3. G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2):81–97, 1956
  4. G. Carlsson. Topological pattern recognition for point cloud data. Acta Numerica, 23:289–368, 5 2014
  5. A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016

Wednesday, September 28, 2016

Extreme color naming experiment finds locus for luma–chroma transformation

From time to time, a new physiological experimentation technique or a significant new instrument is developed, leading to breakthrough discoveries. In the case of color vision, this usually entails a doctoral student and their significant other—and maybe some additional dedicated colleagues or the professor—to undergo the cruel ordeal of having their pupils dilated (mydriasis) and their ciliary muscle paralyzed to avoid accommodation (cycloplegia), then get strapped to a headrest while biting a dental impression mount, to make observations in repeated interminable sessions for months on end, all in the name of science.

In their recent paper The elementary representation of spatial and color vision in the human retina, Ramkumar Sabesan et al. report on a seminal study to locate where in the human visual system (HVS), the luma-chroma encoding occurs in the parvocellular pathway (midget ganglion cells).

This study by Ramkumar Sabesan et al. represents the first time cone photoreceptors of known spectral type have been individually targeted and activated with light in the living human retina.

I cannot believe, it has been thirty years since I drew this diagram:

cognitive model

Although the above diagram looks like a model for the HVS, it was more a plan for my implementation of a color workbench. To keep the head cool and prevent it from overheating, our brain evolved to minimize the usage of energy. This is accomplished by having pipelines where at each stage the information gets recoded to make it more complex but more compact. This is at the cost of speed: while a two-photon catch the shift in electron density takes less than a femtosecond, the entire photo-cycle lasts a picosecond and at the end of the pipeline, adaptation can take seconds and color naming minutes.

An important feature were the bidirectional arrows: we have a feedback loop with control moving down and information moving up. Because of the sequence of recoding and the feedback, the receptors in the retina are not like pixels in a CCD sensor

  • Receptive field: area of visual field that activates a retinal ganglion (H.K. Hartline, 1938)
  • Center-surround fields allow for adaptive coding (transmit contrast instead of absolute values)
  • Horizontal cells presumed to inhibit either its bipolar cell or the receptors: opponent response in red–green and yellow–blue potentials (G. Svaetichin, 1956)
  • Retinal ganglion can be tonic or phasic: pathway may also be organized by information density or bandwidth

The last item comes from a table of the parvocellular and magnocellular pathways Lucia Rositani-Ronchi compiled for me at the 1993 AIC meeting in Budapest:

Originating retinal ganglion cells
Temporal resolution
Slow (sustained responses, low conduction velocity)
Fast (mostly transient responses, some sustained, high conduction velocity)
Modulation dominance
Adaptation occurs at high frequencies
Adaptation occurs at all frequencies
Receives mostly opponent type input from cones sensitive to short and long wavelengths
Receives mostly combined (broadband) input from M and L cones, both from the center and from the surround of receptive fields
Contrast sensitivity
Low (threshold > 10%)
High (threshold < 2%)
LGN cell saturation
Linear up to about 64% contrast
At 10%
Spatial resolution
High (small cells)
Low (large cells)
Spatio-temporal resolution
When fixation is strictly foveal, extraction of high spatial frequency information (test gratings), reflecting small color receptive fields
Responds to flicker
Long integration time
Short integration time
Relation to channels
Could be a site for both a lightness channel as for opponent-color channels. The role depends on the spatio-temporal content of the target used in the experiment
Might be a site for achromatic channels because the spectral sensitivity is similar to Vλ, it is more sensitive to flicker, and has only a weak opponent color component
Possible main role in the visual system
Sustain the perception of color, texture, shape, and fine stereopsis
Sustain the detection of movement, depth, and flicker; reading of text

We have four retinal pigments (erythrolabe, chlorolabe, cyanolabe, rhodopsin) attached by a lysine to a protein backbone. These four pigments are sensitized to photons at 4 energy levels (wavelengths): L, M, S, and rods. The energy levels are not numbers but distributions, namely the probabilities for a photon catch with that chromatophore.

A 3-dimensional signal with L, M, S is not efficient because we need a high spatial resolution but the chromatic information can be at a lower resolution. This is reflected in the modulations transfer functions for the HVS and is exploited for example in image encoding, where we transform an RGB signal into a color opponent signal and then down-sample the chroma images:

CIELAB separations

In 1993, it was not known where this transformation occurs in the HVS. In fact, there is quite a bit of processing in the retina, and many details are still unknown.


In their recent paper The elementary representation of spatial and color vision in the human retina, Ramkumar Sabesan et al. report on a seminal study to locate where in the HVS, the luma-chroma encoding occurs in the parvocellular pathway.

Using an adaptive optics scanning laser ophthalmoscope (AO-SLO), the authors studied 174 L-cone, 99 M-cone, and 12 S-cone samples by stimulating them individually with a 543 nm, 500 ms pulse and asking two subjects to report the perceived color name. The names were restricted to red, green, blue, yellow, white, and not seen.

The subjects reported achromatic sensations 61.8% of the time. When red was reported (22.5% of seen trials), it was more likely to be driven by L- than M-cones, whereas green (15.7%) was more likely to come from the excitation of M-cones. Thus, L-cones tended to signal both white and red, whereas M-cones tended to signal both white and green. The observation that these color percepts roughly align with the predictions of large-field cone-isolating stimuli suggests that the same opponent neuronal circuits may be implicated in both paradigms. This finding also supports the idea that the visual system can learn the spectral identity of individual cones.

The apparent segregation of color categories into distinct populations of cells is suggestive of a parallel representation of color and achromatic sensations. Moreover, these results imply that, for a large number of cones, their individual activation is not sufficient to produce a color. (Remember that in this experiment single cones are excited; in free vision, most cones are activated and the eye saccades, presenting a point in the visual field to several cones.)

The authors found that the cones most likely to generate strong spectral opponency in a parvocellular neuron, that is, those surrounded by cones of opposing type, were not more likely to generate red or green percepts. Rather, all these examples, when stimulated in isolation, drove achromatic percepts on a majority of the trials.

There is little doubt that the long-duration supra-threshold stimulation of individual cones here influences the firing of a number of different ganglion cell types. In particular, a multi-electrode study demonstrated that the activation of a single cone simultaneously evoked responses in both midget (parvocellular) and parasol (magnocellular) ganglion cells. The results may be particularly informative in differentiating proposals about the role of parvocellular neurons in achromatic spatial and color vision.

The study confirms the old result that the red-green system samples the visual world at a lower resolution than the achromatic system. The new results from the studies reported in the present paper are consistent with the idea that the HVS represents these two pieces of information with separate pathways that emerge as early as the photoreceptor synapse: one chiefly concerned with high-resolution achromatic vision and a second, lower-resolution color system.

The luma-chroma transformation with chroma subsampling is very important in image processing. In your opinion, does this new result allow the design of better imaging pipelines? Does this allow us to design better retinex algorithms? Join the conversation in the Trellis group.

Citation: R. Sabesan, B. P. Schmidt, W. S. Tuten, A. Roorda, The elementary representation of spatial and color vision in the human retina. Sci. Adv. 2, e1600797 (2016).

Thursday, September 1, 2016

Industrie 4.0: Entwicklungsfelder für den Mittelstand

In den 80er Jahren verschob sich die US-amerikanische industrielle Herstellung nach Fernost, vorwiegend China. Es war ein unaufhaltbares Ereignis, denn der Stand der industriellen Fertigung hatte sich seit Anfang Jahrhundert nicht mehr weiterentwickelt.

Nachdem ich in den 60er und 70er Jahre mehrere Werke in der Schweiz und Deutschland besucht hatte, alle sauber wie ein Spital und Qualitätskontrollen an jedem Schritt, und die Fachkräfte im angewandten Physik Institut kennenlernte, die jedes noch so verrückt schwierige Apparat-Teilchen mit unglaublich engen Toleranzen herstellen konnten, war mir der erste Besuch einer US-amerikanischen Fabrik ein grosser Schock, den ich wohl nie vergessen werde.

Es war die Burroughs B1700 Fabrik in Goleta bei Santa Barbara, ende 1977. Die Fabrikationshalle war laut, da die Arbeiterinnen jede ein Transistor Radio an hatte, jeder auf einem anderen Sender. Der Fussboden war schmutzig, die Arbeitsflächen voller Chips- und Kekskrümeln. Die Lötstrasse in der Mitte der Halle sah aus, als ob sie aus einem Charlie Chaplin Film herausgefallen war. Keine Arbeits-Station hatte eine Qualität-Kontrolle: am Ende der verknoteten Herstellungsstrasse war ein grosser Tisch an dem sich die fertigen Schaltkreis-Platinen stapelten. Eine Arbeiterin nahm eine nach der anderen die Platinen und klopfte sie mit einem Bleistift nach kalten Lötstellen ab, denn auf dem Ende des Bleistifts war ein Schreibmaschinen-Radiergummi aufgeschoben.

Nun wusste ich, weshalb wir die Computer in die Schweiz erst im Geschäft aufbauen mussten und dann die Defekte suchen mussten, bevor wir sie den Kunden liefern konnten. Praktisch jede Platine musste nachgelötet werden. Ein Jahrzehnt später, war die Fabrik bei Xerox in Rochester immer noch auf dem gleichen Niveau, nur wurde dort mehr mit dem Hammer gearbeitet um bei der manuellen Montage die fehlenden Toleranzen zu überwältigen.

Mit diesem Hintergrund, stand der Verlagerung nach Fernost mit den niedrigen Löhnen nichts im Wege. Ich glaubte, dass sich die industrielle Fertigung in den USA nie erholen würde. Dies ist ganz anders als in Europa oder Japan, wo das gesellschaftliche Schwergewicht auf Schulung der Arbeiter und Verbesserung der Herstellungs-Technologien liegt. Das Resultat ist, dass zum Beispiel der Zughersteller Stadler Rail einen Grossauftrag aus den USA erhalten konnte: Das Bahnunternehmen Caltrain hat bei Stadler 16 Doppelstocktriebzüge für 551 Millionen Dollar bestellt. Die Züge werden zwischen San Francisco und dem Silicon Valley fahren, wie der Tages Anzeiger vom 16. August berichtete.

Aber, es bewegt sich nun doch etwas im Silicon Valley. Praktisch jede Auto Firma hat jetzt eine Niederlassung im Silicon Valley, aber nicht nur die klassischen Firmen: die Chinesischen Firmen die die amerikanischen Produkte anfertigen, investieren hier in Forschungs- und Entwicklungs-Labors um die neuesten Technologien zu konzepieren. Während deutsche Hersteller immer noch 7 Jahre brauchen um ein neues elektronisches Auto zu entwickeln, kann man im Silicon Valley frischgebackene Doktoren von Stanford und UC Berkeley anstellen, die die gleiche Arbeit in ein paar Jahre machen, denn mit Industrie 4.0 haben wir eine Revolution.

Gerade letztes Wochenende hatte es im New York Times einen Artikel, wie die General Electric ein Labor mit 1400 Software-Entwicklern in San Ramon aufgestellt hat, um das System Predix zu schmieden, welches nich nur eine Grundlage für Industrie 4.0 sein soll, sondern auch es ermöglichen soll, die Wartung der Produkte im Einsatz vorherzusagen.

Das Fraunhofer-Institut für Produktionstechnik und Automatisierung IPA baut nun das Applikationszentrum Industrie 4.0 auf. Jedes Unternehmen sollte die Studie Industrie 4.0: Entwicklungsfelder für den Mittelstand anfordern und sie sorgfältig lesen. Sie ist von Herrn Markus Bressner, markus.bressner(at) erhältlich, Telefon +49 711 970-1808.

Industrie 4.0: Entwicklungsfelder für den Mittelstand