8 Requirements of Intelligence

What is intelligence, in the context of machine learning and AI? A classic from 1979, Hofstadter’s GEB, gives eight essential abilities for intelligence:

  1. to respond to situations very flexibly
  2. to take advantage of fortuitous circumstances
  3. to make sense out of ambiguous or contradictory messages
  4. to recognize the relative importance of different elements of a situation
  5. to find similarities between situations despite differences which may separate them
  6. to draw distinctions between situations despite similarities which may link them
  7. to synthesize new concept by taking old concepts and putting them together in new ways
  8. to come up with ideas which are novel

It seems to me that the keyword is “flexibility“. Our world is complex, and a creature must be able to act in an infinite variety of circumstances. Sometimes a simple rule is enough, sometimes you need a combination of rules, and sometimes a totally new rule is required.

Hofstadter’s usage of the term stereotyped response got me thinking about Kahneman’s System 1 and 2. In those terms, it seems that System 1 covers all eight abilities. System 1 is fast thinking, applying a stereotypic solution to a situation. No actual reasoning or logical “thinking” is required to fulfil the requirements. However, the stereotypic solutions or rules must be flexible.

Two Views to Zonings Laws

By co-incidence I read two views on zoning. This resonated with me because I haven’t thought about zoning being optional, that is, that there could exist Western cities without proper zoning.

First, about San Francisco:

Imagine you’re searching for an apartment in San Francisco – arguably the most harrowing American city in which to do so. The booming tech sector and tight zoning laws limiting new construction have conspired to make the city just as expensive as New York, and by many accounts more competitive.

Second, about Houston:

Houston is the largest city in the United States without any appreciable zoning. While there is some small measure of zoning in the form of ordinances, deed restrictions, and land use regulations, real estate development in Houston is only constrained by the will and the pocketbook of real estate developers. […] This arrangement has made Houston a very sprawled-out and very automobile-dependent city.

Having lived in the capital area of Finland, that I surmise suffers from a similar effect as San Francisco, I wonder if there are cities that strike the balance right? The housing costs are (too?) high, but I enjoy the beautiful, walkable city.

Visualizing Neural Net Using Occlusion

I got good results using an RNN for a text categorization task. Then I tried using a 1D CNN for the same task. To my surprise, I got even better results, and the model was two magnitudes smaller. How can such a lightweight model perform so well?

Out of curiosity, and also to verify the results, I wanted to visualize what the neural network was learning. This being one-dimensional text data the convolution filter visualizations were not interesting. For image data, check out Understanding Neural Networks Through Deep Visualization.

I turned to occlusion, as described in Matthew Zeiler’s paper. To find out what a neural network is recognizing we occlude parts of the input to the network. When the model’s output deviates, the occluded part is significant to the prediction. I haven’t seen occlusion used for text data but decided to give it a go.

Test case: categorize sports team names by age. For example:

 "Westend Old Boys" -> "adult";
 "Indians G Juniors" -> "children";
 "SV Alexanderplatz U7" -> "children";
 "Karlsruhe Erste Männer" -> "adult";

Occlusion Visualization

The model outputs the probability of a junior team. In Finland, D-boys are typically 13-14 years old. When the occluder slides over the characters that signify the age group, i.e. “_D_”, the probability drops drastically. On the other hand, occluding parts of the town name “Rosenburg” makes very little difference. It seems that the model is truly identifying the relevant region in the input data.

Why Loss and Accuracy Metrics Conflict?

loss function is used to optimize a machine learning algorithm. An accuracy metric is used to measure the algorithm’s performance (accuracy) in an interpretable way. It goes against my intuition that these two sometimes conflict: loss is getting better while accuracy is getting worse, or vice versa.

I’m working on a classification problem and once again got these conflicting results on the validation set.

Loss vs Accuracy graph

Loss vs Accuracy

Accuracy (orange) finally rises to a bit over 90%, while the loss (blue) drops nicely until epoch 537 and then starts deteriorating. Around epoch 50 there’s a strange drop in accuracy even though the loss is smoothly and quickly getting better.

My loss function here is categorical cross-entropy that is used to predict class probabilities. The target values are one-hot encoded so the loss is the best when the model’s output is very close to 1 for the right category and very close to 0 for other categories. The loss is a continuous variable.

Accuracy or more precisely categorical accuracy gets a discrete true or false value for a particular sample.

It’s easy to construct a concrete test case showing conflicting values. Case 1 has less error but worse accuracy than case 2:


Loss vs Accuracy example spreadsheet

For reference, calculating categorical cross-entropy in Keras for one sample:

truth = K.variable([[1., 0., 0.]])
prediction = K.variable([[.50, .25, .25]])
loss = K.eval(K.categorical_crossentropy(truth, prediction))

In what kind of situations does loss-vs-accuracy discrepancy occur?

  • When the predictions get more confident, loss gets better even though accuracy stays the same. The model is thus more robust, as there’s a wider margin between classes.
  • If the model becomes over-confident in its predictions, a single false prediction will increase the loss unproportionally compared to the (minor) drop in accuracy. An over-confident model can have good accuracy but bad loss. I’d assume over-confidence equals over-fitting.
  • Imbalanced distributions: if 90% of the samples are “apples”, then the model would have good accuracy score if it simply predicts “apple” every time.

Accuracy metric is easier to interprete, at least for categorical training data. Accuracy however isn’t differentiable so it can’t be used for back-propagation by the learning algorithm. We need a differentiable loss function to act as a good proxy for accuracy.

Spell Out Convolution 1D (in CNN’s)

I’m working on a text analysis problem and got slightly better results using a CNN than RNN. The CNN is also (much) faster than a recurrent neural net. I wanted to tune it further but had difficulties understanding the Conv1D on the nuts and bolts level. There are multiple great resources explaining 2D convolutions, see for example CS231n Convolutional Neural Networks for Visual Recognition, but I couldn’t find a really simple 1D example. So here it is.

1D Convolution in Numpy

import numpy as np
conv1d_filter = np.array([1,2])
data = np.array([0, 3, 4, 5])
result = []
for i in range(3):
 print(data[i:i+2], "*", conv1d_filter, "=", data[i:i+2] * conv1d_filter)
 result.append(np.sum(data[i:i+2] * conv1d_filter))
print("Conv1d output", result)

[0 3] * [1 2] = [0  6]
[3 4] * [1 2] = [3  8]
[4 5] * [1 2] = [4 10]

Conv1d output [6, 11, 14]

The input data is four items. The 1D convolution slides a size two window across the data without padding. Thus, the result is an array of three values. In Keras/Tensorflow terminology I believe the input shape is (1, 4, 1) i.e. one sample of four items, each item having one channel (feature). The Convolution1D shape is (2, 1) i.e. one filter of size 2.

1D Convolution Kernel Size 2

The Same 1D Convolution Using Keras

Set up a super simple model with some toy data. The convolution weights are initialized to random values. After fitting, the convolution weights should be the same as above, i.e. [1, 2].

from keras import backend as K
from keras.models import Sequential
from keras.optimizers import Adam
from keras.layers import Convolution1D
toyX = np.array([0, 3, 4, 5]).reshape(1,4,1)
toyY = np.array([6, 11, 14]).reshape(1,3,1)

toy = Sequential([
 Convolution1D(filters=1, kernel_size=2, strides=1, padding='valid', use_bias=False, input_shape=(4,1), name='c1d')
toy.compile(optimizer=Adam(lr=5e-2), loss='mae')
print("Initial random guess conv weights", toy.layers[0].get_weights()[0].reshape(2,))

Initial random guess conv weights [-0.99698746 -0.00943983]

Fit the model and print out the convolution layer weights on every 20th epoch.

for i in range(200):
  h = toy.fit(toyX, toyY, verbose=0)
  if i%20 == 0:
    print("{:3d} {} \t {}".format(i, toy.layers[0].get_weights()[0][:,0,0], h.history))

  0 [-0.15535446  0.85394686] 	 {'loss': [7.5967063903808594]}
 20 [ 0.84127212  1.85057342] 	 {'loss': [1.288176417350769]}
 40 [ 0.96166265  1.94913495] 	 {'loss': [0.14810483157634735]}
 60 [ 0.9652133   1.96624792] 	 {'loss': [0.21764929592609406]}
 80 [ 0.98313904  1.99099088] 	 {'loss': [0.0096222562715411186]}
100 [ 1.00850654  1.99999714] 	 {'loss': [0.015038172714412212]}
120 [ 1.00420749  1.99828601] 	 {'loss': [0.02622222900390625]}
140 [ 0.99179339  1.9930582 ] 	 {'loss': [0.040729362517595291]}
160 [ 1.00074089  2.00894833] 	 {'loss': [0.019978681579232216]}
180 [ 0.99800795  2.01140881] 	 {'loss': [0.056981723755598068]}

Looks good. The convolution weights gravitate towards the expected values.

1D Convolution and Channels

Let’s add another dimension: ‘channels’. In the beginning this was confusing me. Why is it 1D conv if input data is 2D? In 2D convolutions (e.g. image classification CNN’s) the channels are often R, G, and B values for each pixel. In 1D text case the channels could be e.g. word embedding dimension: a 300-dimensional word embedding would introduce 300 channels in the data and the input shape for single ten words long sentence would be (1, 10, 300).

toyX = np.array([[0, 0], [3, 6], [4, 7], [5, 8]]).reshape(1,4,2)
toyy = np.array([30, 57, 67]).reshape(1,3,1)
toy = Sequential([
 Convolution1D(filters=1, kernel_size=2, strides=1, padding='valid', use_bias=False, input_shape=(4,2), name='c1d')
toy.compile(optimizer=Adam(lr=5e-2), loss='mae')
print("Initial random guess conv weights", toy.layers[0].get_weights()[0].reshape(4,))

Initial random guess conv weights [-0.08896065 -0.1614058   0.04483104  0.11286306]

And fit the model. We are expecting the weights to be [[1, 3], [2, 4]].

# Expecting [1, 3], [2, 4]
for i in range(200):
  h = toy.fit(toyX, toyy, verbose=0)
  if i%20 == 0:
    print("{:3d} {} \t {}".format(i, toy.layers[0].get_weights()[0].reshape(4,), h.history))

  0 [-0.05175393 -0.12419909  0.08203775  0.15006977] 	 {'loss': [51.270969390869141]}
 20 [ 0.93240339  0.85995835  1.06619513  1.13422716] 	 {'loss': [34.110202789306641]}
 40 [ 1.94146633  1.8690213   2.07525849  2.14329076] 	 {'loss': [16.292699813842773]}
 60 [ 2.87350631  2.8022306   3.02816415  3.09959674] 	 {'loss': [2.602280855178833]}
 80 [ 2.46597505  2.39863443  2.96766996  3.09558153] 	 {'loss': [1.5677350759506226]}
100 [ 2.30635262  2.25579095  3.12806559  3.31454086] 	 {'loss': [0.59721755981445312]}
120 [ 2.15584421  2.15907145  3.18155575  3.42609954] 	 {'loss': [0.39315733313560486]}
140 [ 2.12784624  2.19897866  3.14164758  3.41657996] 	 {'loss': [0.31465086340904236]}
160 [ 2.08049321  2.22739816  3.12482786  3.44010139] 	 {'loss': [0.2942861020565033]}
180 [ 2.0404942   2.26718307  3.09787416  3.45212555] 	 {'loss': [0.28936195373535156]}
  n [ 0.61243659  3.15884042  2.47074366  3.76123118] 	 {'loss': [0.0091807050630450249]}

Converges slowly, and looks like it found another fitting solution to the problem.

1D Convolution and Multiple Filters

Another dimension to consider is the number of filters that the conv1d layer will use. Each filter will create a separate output. The neural net should learn to use one filter to recognize edges, another filter to recognize curves, etc. Or that’s what they’ll do in the case of images. This excercise resulted from me thinking that it would be nice to figure out what the filters recognize in the 1D text data.

toyX = np.array([0, 3, 4, 5]).reshape(1,4,1)
toyy = np.array([[6, 12], [11, 25], [14, 32]]).reshape(1,3,2)
toy = Sequential([
 Convolution1D(filters=2, kernel_size=2, strides=1, padding='valid', use_bias=False, input_shape=(4,1), name='c1d')
toy.compile(optimizer=Adam(lr=5e-2), loss='mae')
print("Initial random guess conv weights", toy.layers[0].get_weights()[0].reshape(4,))

Initial random guess conv weights [-0.67918062  0.06785989 -0.33681798  0.25181985]

After fitting, the convolution weights should be [[1, 2], [3, 4]].

for i in range(200):
  h = toy.fit(toyX, toyy, verbose=0)
  if i%20 == 0:
    print("{:3d} {} \t {}".format(i, toy.layers[0].get_weights()[0][:,0,0], h.history))

  0 [-0.62918061 -0.286818  ] 	 {'loss': [17.549871444702148]}
 20 [ 0.36710593  0.70946872] 	 {'loss': [11.24349308013916]}
 40 [ 1.37513924  1.71750224] 	 {'loss': [4.8558430671691895]}
 60 [ 1.19629359  1.83141077] 	 {'loss': [1.5090690851211548]}
 80 [ 1.00554276  1.95577395] 	 {'loss': [0.55822056531906128]}
100 [ 0.97921425  2.001688  ] 	 {'loss': [0.18904542922973633]}
120 [ 1.01318741  2.00818276] 	 {'loss': [0.064717374742031097]}
140 [ 1.01650512  2.01256871] 	 {'loss': [0.085219539701938629]}
160 [ 0.986902    1.98773074] 	 {'loss': [0.022377887740731239]}
180 [ 0.98553228  1.99929678] 	 {'loss': [0.043018341064453125]}

Okay, looks like first filter weights got pretty close to [1, 2]. How about the 2nd filter?

# Feature 2 weights should be 3 and 4

array([ 3.00007081,  3.98896456], dtype=float32)

Okay, looks like the simple excercise worked. Now back to the real work.

Less Sexism in Finnish

Machine learning models are only as good as the data you use to train them. AI sexism and racist machines have made the news in 2017.

Yesterday, Slava Akhmechet tweeted about his word2vec language model test. According to the tweet, the test was trained on the Google News dataset that contains one billion words from news articles in English.

Word2vec can be used to find similar meanings between words. For example: (Man | King) would be analogous to (Woman | ________) … Can you guess? Yes, “Queen”.

Image from Tensorflow Tutorial

However, Slava’s tweet shows that according to news text, while “he” is “persuasive”, “she” is “seductive”, and so on:

I’ve ran some word2vec tests in Finnish. I thought it would be interesting to see if a similar bias exists in Finnish as well. You know, Finland being one the most gender-equal countries, where we speak a language that has only gender-neutral pronouns. In Finnish, both “he” and “she” are the same word: “hän”.


Finnish is more equal than English.

For example, in Finnish, both “man” and “woman” have similarity to “gynecologist” and “general practitioner”. (Top-10 similarity for woman included “nurse”, which was missing for man.)

(Man+persuasive) is mostly equal to (Woman+persuasive): both are “credible”, “dashing”, “impassioned”.

Mostly equal in traffic as well. (Man+biker) has similarity score 0.54; (Woman+biker) = 0.51.

Personal qualities:
 (Man+credible)    = 0.24 vs (Woman+credible)  = 0.25
 (Man+dependable)  = 0.21 vs (Woman+dependable)= 0.22

It’s not all that rosy, though.

 (Man+manager)     = 0.24 vs (Woman+manager)   = 0.17
 (Man+sensitive)   = 0.23 vs (Woman+sensitive) = 0.30

The training data for this word2vec model is the Finnish Internet, i.e. articles, news, discussions, and online forums in Finnish (by Turku BioNLP Group)






Computer Backup Using Restic on Mac

Restic is an open source backup program. I really like Restic’s design goals, so I chose to try Restic for backing up my computer at home. Restic is still a young project and it was (for me) somewhat difficult to figure out the right way the utilize it. I wrote down my approach – maybe it’ll be useful for someone else in the future. So here’s a how-to describing a Restic setup on a Mac. This is neither a recommended setup nor a guideline, but may give useful ideas for your own setup.

My objective is to have a reliable, automatic backup process for my Mac at home. There’s around 500GB of data to backup to a network drive on the same LAN. I also want to make an off-site backup over WAN but, to keep things simple, the plan is to use rsync to copy the Restic repository, instead of utilizing Restic in that part of the process.

What To Back Up?

I want to backup data only, such as photos that I’ve taken and that can’t be replaced if lost. I chose not to do a full backup, so I’ll exclude e.g. the operating system, installed applications, and purchased media that can be replaced or re-downloaded from the vendor. I’ll backup the data that resides in my home folder:

  • /Users/jussi
    • /Applications (for the few app-specific settings)
    • /Documents
    • /Movies (my GoPro stuff lives here)
    • /Pictures
    • /Music/GarageBand (most of Music is .mp3’s but I’m only backing up my own projects)

At first I thought I’d make an include-list containing the folders I want in the backup. Restic supports includes and excludes, but the way the support is currently done, it’s easier to include the whole home folder and then exclude some folders. This is probably better anyway, because new subfolders will be included in the backup by default, and you don’t need to remember to add the new folder to the includes list for backups.

I found it the easiest to list the home dir contents and use the result as the basis for the excludes file, which turned out along these lines:

$ ls -a1 ~/



Step 1: Initialize

Restic’s user manual is a great source, and describes clearly how to install the program.

My network drive is mounted simply as an SMB share. I decided to create a folder called restic for the backups. Then I figured that because Restic supports SFTP repositories, I might share that space with friends for their off-site backup needs. So I made a subfolder for my own backups: restic/jussi. My first idea was to name the backup folder by device, so I would have something like /laptop, /imac, and /raspberry but then I realised it’s better to back up all devices to the same folder. This is because Restic’s repository structure already distinguishes different hosts, and because there may be even greater de-duplication if all data goes to the same repository. So here we go:

$ ./restic init --repo /Volumes/restic/jussi

Step 2: Environment

Restic encrypts backup data so repository access requires a key. To avoid inputing the password or configuring it in multiple scripts, I simply stored my password to a local plaintext file. The password secures the backed up data in the (remote) repository. In the local system, if one can access the plaintext password file, they can access the filesystem anyway. So even if it feels wrong to store passwords as plaintext, I assume it’s ok here.

$ cat mypassword > repo_pwd.txt

Step 3: Testing

Let’s try it out. Back up a bit of data:

$ ./restic -r /Volumes/restic/jussi -p repo_pwd.txt backup --exclude-file exclude.txt ~/Music/GarageBand

See what went to the repository:

$ ./restic -r /Volumes/restic/jussi -p repo_pwd.txt snapshots

Ok, looks good. I would have liked to browse the repository but didn’t have FUSE installed and Restic’s WebDAV support is still pending (restic#485) so I just did a restore:

$ ./restic -r /Volumes/restic/jussi -p repo_pwd.txt restore bcd46723 --target ~/temp/test

Step 4: Full Initial Backup

$ nice -n 10 ./restic -r /Volumes/restic/jussi -p repo_pwd.txt backup --exclude-file exclude.txt --one-file-system --tag initial /Users/jussi

I probably won’t be deleting much of the data once it’s backed up. I tend to save all pics and videos for some obscure future use. Therefore I used the initial tag for the snapshot to help later, when devising a policy for deleting old backups.

This step took many hours. I think the bottleneck was my LAN capacity. At this point, Restic reports that there are 411GiB to back up. After the process finished, I did

$ du -hc -d 1 /Volumes/restic/jussi

to find out the repository size but got weird numbers well over terabytes (possibly because it’s an SMB share?). So, instead, I opened a Finder window with total size field turned on, and now the repository size was reported as 301GB. Interesting.

Step 5: Housekeeping i.e. Forget and Prune

To save disk space, I set up a process for deleting old backups periodically. My mindset here is that if a file is changed on my computer, it’s also changed in the backup. There won’t be multiple versions of the file or kind of undo-feature for deleted files. However, I think it saves so much time and bandwidth that it’s worthwhile to keep the initial backup and build on that.

The Restic flow is to first forget snapshots from the index and then actually delete the data from disk using prune.

$ /restic -r /Volumes/restic/jussi -p repo_pwd.txt forget --keep-tag initial --keep-weekly 2
$ ./restic -r /Volumes/restic/jussi -p repo_pwd.txt prune

Step 6: Automate

Restic is (currently) optimized for fast incremental backups. But prune may be slow. So I’ll do this in two parts: a daily backup and a weekly housekeeping, using launchd.

The weekly housekeeping, or forget and prune, requires running two parts so I made a separate bash script for it. Launchd is then configured to run this script.

$ launchctl load ~/Library/LaunchAgents/local.restic_backup.plist
$ launchctl load ~/Library/LaunchAgents/local.restic_housekeeping.plist
$ launchctl list | grep -v com.apple. # To see if they loaded properly
$ launchctl start local.restic_backup # Do a test run

Manually starting the process is a good way to catch launchd errors. If all is well, you should see logging in /tmp/restic.log.

TODO: these processes only log to /tmp. It would be much better to alert if things go wrong, and maybe also report weekly/monthly that things are going ok. Maybe restic#667 will help here.

TODO 2: the backup process is very slow even over a good wifi connection. For the 400GB it took 2 hours 23 minutes. As a matter of fact, it wouldn’t matter if a backup process takes time, but Restic needs to communicate with the repository in a way that renders my LAN slow. One cure could be to force Restic to read the local source data in total and avoid querying the remote repository using –force.

[EDIT] Samba on Mac is  s l o w . The –force switch did not help and I almost decided to ditch Restic because the daily backup was just too much for my wifi. But then I found quite a few reports about Apple’s slow SMB implementation. I switched from using Samba to sftp, and the daily backup is now fast and does not clog up my network.

For Reference

$ cat ~/Library/LaunchAgents/local.restic_backup.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
$ cat ~/Library/LaunchAgents/local.restic_housekeeping.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
$ cat housekeeping.sh 
./restic -r /Volumes/restic/jussi -p repo_pwd.txt forget --keep-tag initial --keep-weekly 2
./restic -r /Volumes/restic/jussi -p repo_pwd.txt prune

Management by Counting on 110% Effort

There’s a massively delayed subway construction project in Espoo, Finland. Based on the public info, the project seems to be a mess of mismanagement. In a discussion regarding the project’s management style, a friend of mine quipped:

You know these managers telling their team will get this done by pulling a few all-nighters, or that she’ll be able to push the team to give a 110% effort? Anybody listening to these kind of arguments must immediately file the message to the La-la Land catagory.

Well said. It’s easy to agree that wishful thinking is not the right way forward. A manager can’t generate positive outcome by force of will. You can’t force facts to inexistence by ignoring them.

But how about the flipside? Taking a team’s velocity as granted is not right either. It is possible for a team to perform much better than they are used to, if led properly. There are numerous examples of sports teams achieving victories that everyone deemed impossible. Think about how Iceland made it to the quarter finals in Euro 2016. It wasn’t supposed to be possible, but inspiration, confidence, and attitude will work wonders.

I have a fond memory of a time I took a friend for a 15km run. For someone not used to running or biking, the distance sounded insurmountable. It took a while to get her even thinking about being able to run the distance. Even when we set out, she was making contingency plans. My job was to show that it is possible, assure that she can do it, and try to maintain speed and direction. Pretty trivial, eh? The thing is, when we finished, she had done something she didn’t think was possible. That was a powerful and inspiring moment.

If, as a leader, you settle for the usual, “the team’s 100% effort”, you won’t reach the full potential. There are famous anecdotes about Steve Jobs:

Those who did not know Jobs interpreted the Reality Distortion Field as a euphemism for bullying and lying. But those who worked with him admitted that the trait, infuriating as it might be, led them to perform extraordinary feats. Because Jobs felt that life’s ordinary rules didn’t apply to him, he could inspire his team to change the course of computer history with a small fraction of the resources that Xerox or IBM had. [HBR 2012-4]

I tend to think those anecdotes represent wishful thinking and belong to the La-la Land.

However, I believe it’s worthwhile to invest time and effort to pushing the limits. The famous scene from Invictus resonates with me. “But how to get them to be better than they think they can be?”

The million dollar question is “How to inspire ourselves to greatness when nothing less will do?”

What Does a CDO Do?

Your board of directors is incomplete if you don’t employ a Chief Digital Officer. Right? Coming from tech startups, this sounds odd to me. For us, digitalization is the air we breath.

So what do the companies need a CDO for?


IIC Partners published a global survey last September. According to the Rise of Digital Leadership:

  • 1/4 of companies employ a CDO currently.
  • 1/3 of companies plan to hire a CDO in the next two years.
  • From that we can calculate that 50% of businesses will soon have a CDO

From the survey, the CDO’s responsibilities are:

  • Ensuring digital assets work across business units
  • Articulating the strategic vision to other leadership teams
  • Championing increased value through use of digital transformation

The premise seems to be that digitalization is complex and significant enough to require a specific function, on par with CFO, CMO, and CIO. Digitalization enables a business to build new capabilities and move quickly. McKinsey’s Markovitch and Willmott write that Customers want a quick and seamless digital experience, and they want it now.

Most definitions of CDO’s role seem to focus on business transformation, and digitalization affecting all business functions. To me that sounds just like the good ol’ business development manager. And maybe it is: CDO is non-permanent role. When digitalization is integrated into business, there won’t be any particular digital strategy, but just strategy in the digital world.