The future of Data Science – thoughts from the Strata London conference

By Neal O’Riain, Pivigo Community Manager

strata_bw

According to the posters dotted around the London Underground, Strata + Hadoop is the ‘Lollapolooza of big data conferences’. To be perfectly honest when I first saw that I had very little idea what it might mean, but having attended I feel better qualified to take a guess.

Strata + Hadoop London is one of the largest data science conferences in the world. It ran over four days from the 31st of May to the 3rd of June, hosted dozens of talks, thousands of attendees, and some of the largest companies in tech. Where better to go to find out what’s going on in the field of data science, and what might be next?

Coming from a background in academia, I was completely unprepared for Strata. In the more parochial field of stellar astrophysics, conferences are a small affair; often dry technical talks, maybe coffee and a poster session. Strata is a different beast, with talks spanning state-of-the-art technology, business use cases, and predictions of the future from some of the largest, most influential companies in the field. Over the course of the conference I listened to talks and spoke to people from various backgrounds about some interesting data applications, discussed the role that a data scientist plays in a company, and got an idea of where the field might be heading.

At big and small companies, data is being put to work in some really fascinating ways, and at Strata I went to some talks outlining some eye-catching and downright cool tech. Among the biggest recent advances in the field of data science has been the widespread application of machine learning techniques, and their success in solving a wide variety of problems. One of the most exciting uses of these methods presented was Microsoft’s Seeing AI, an app designed to help the blind community to interact with the world in a new way. Based on deep learning-RNN’s this application can provide people who are visually impaired with an accurate description of an image, as well as being able to recognise faces and emotions, and read text aloud; resulting in a product that could have a huge positive impact. It’s always great to see data applications with the potential to really improve lives, and another such project was presented by Dirk Gorissen, Head of R&D at Skycap. He is developing a machine learning pipeline to detect buried landmines in data from drone-mounted ground-penetrating radar. There are estimated to be 110 million active landmines in 70 countries worldwide, and finding these can help reduce the number of people killed by these weapons each year.

Behind a lot of these advance lies Tensorflow, Google’s open-source machine learning library, which was enthusiastically presented by Sherry Moore. Despite being open-sourced only six months, Tensorflow has grown quickly, developing an enthusiastic community, and becoming the go-to tool for a wide-variety of machine learning tasks. Since release its functionality and compatibility have been greatly expanded in terms of both hardware and software, now supporting Python 3.3+, CuDNN R4, iOS, and providing new high level APIs. Suffice to say Tensorflow is very cool, and I’d advise pulling it down and having a look at what it can do!

Over the course of the conference I had a lot of conversations about the role a data scientist plays in a company, and some of the challenges that they may face. One topic that really came to the fore was the idea that companies now feel that the “hype” stage of data science has passed, and that the industry has begun to mature and provide some really valuable insights. At our Business, Analytics and Data Science Meetup, co-hosted with Strata, we heard a very interesting talk from Thomas in’t Veld, the Head of Data Science at Peak. During his talk Thomas made the point that due to the proliferation of data services a new start-up may only need two members of staff, a designer and an engineer – a bold statement from a data scientist! Thomas was quick to add that this is only true at small start-ups, and that at the value of a data scientist’s skills, their statistical knowledge, and ability to see the narrative in a data-set, is of great importance to companies, and this importance is recognised more and more.

As an aside, Stefanie Posavec, who will be presenting at our next meetup on the 23rd of August, gave a marvellous keynote at Strata, describing her year-long Dear Data project, where each week she and a friend manually gathered data, drew it on a postcard, and sent their findings to one another. Her talk at S2DS in August is something to really look forward to!

Finally, a frequent topic of discussion at Strata was the future of data, and the developments in technology. A fascinating keynote by Stuart Russell, Professor of Computer Science at Berkeley, on the future of artificial intelligence outlined the possibilities stemming from general AI, both exciting and terrifying (Dr. Russell sits on the board of the Centre for the Study of Existential Risk!). In his keynote Professor Russell discussed his research into Inverse Reinforcement Learning, and how these methods could be applied to teach an AI to act according to human value systems. In this month’s Scientific American, Professor Russell asks “Should we fear super smart robots?“, and in his talk hinted that the answer is “probably, a little bit…” Despite this slightly worrying look at AI, Professor Russell also points out that the power of more generally intelligent computer systems would revolutionise our society, potentially solving problems that have been beyond the reach of the human mind.

After two days of talks, discussion, and technology at Strata it would be very difficult to leave without feeling inspired and excited about the field, the advances that have been made, and the challenges still to be tackled!