Data Science Community

Urban Traffic Data Hack: Tons of sensor and traffic data + complex urban problems

Roland Major, Enterprise Architect within TfL’s Information Management team, reports on the Urban Traffic Data Hackathon.

In the last post on this blog, Tim introduced the roads data available in our unified API, describing its importance as we encourage road users to check before travelling while we carry out our Road Modernisation Plan.

We continue to engage with developers to help us in making driving in London better, with innovative solutions to traffic, road disruption and planned works information through apps created from our open data. As part of our engagement with the developer community we held an Urban Traffic Data Hackathon on 14-15 November.

Supported by our Roads Space Management team, the event was planned in order to give us the opportunity to engage directly with developers to work on creative and innovative solutions to the challenges on London’s roads. In putting the Hackathon together, we worked with Data Science London (DSL), the largest data science community in Europe, and arranged for data scientists and innovators who are members of DSL to take part in the event.

Queen Mary University hosted our Urban Traffic Data Hackathon on 14-15 November

In holding this event we faced a challenge in making available the massive amount of vehicle movement data needed. Working with DSL we were able to shape a solution through cloud hosting, file and HDFS storage and Docker based deployment of open source tools including Hadoop and Spark.  A wide range of tools were also provided including Python, R, SQL and Scala hosted notebooks.  Because of the spatial nature of the data referencing, a number of open mapping packages were made available.

The run up to the event included a pre briefing session with 380 attendees to share our aims and information about the data sets we were making available. The data sets were received with great interest by the group, and the numbers wishing to attend the weekend were so high they had to be limited to the venue capacity.

The event was held on the weekend of 14-15 November at Queen Mary University Campus. This gave us a venue with an auditorium, multiple breakout rooms and great WiFi connectivity to support a large group of users accessing the data.

Dataset briefings were held and prizes for the following three challenges were announced:

  1. Best understanding of urban movement
  2. Best approach to visualisation of urban movement
  3. The most innovative use of data

Teams at work

Those taking part formed their own teams and selected the challenges they wanted to compete for. We provided mentors who were on hand to explain the data in detail and respond to questions. To help everyone focus on business outcomes, the mentors were able to provide insight into the workings of our operations, while DSL provided a technical help desk. Having had 24 hours to work on the data, the teams presented their work on the Sunday afternoon. Ten presentations were delivered and there was a fantastic range of ideas and concepts on show.

The winning team for the best movement model had processed the 2 terabytes of traffic sensor data, demonstrating that this data could identify delays faster than they currently get reported. The group had built a model recognising road incidents, showing great creativity, making them worthy winners.

The award for best visualisation went to a team that moved the data onto open mapping. This offered a great opportunity for business information to be more open, and again was highly relevant to the challenges we face in helping drivers to plan and complete their journeys as painlessly as possible.

The award for best innovation was for analysis identifying when and from which station you could catch a tube train with the best chance of getting a seat, based on location, date and time.

Leon Daniels with Innovation Winning Team

Following the success of this event and the level of interest in our data we’ve already received many requests to run other such events. This continues to open up new ways to explore solutions to London’s challenges. If you attended the Hackathon and have any comments or feedback on the event itself, or if you have any other questions or comments on TfL’s open data and unified API in general, we’d love to hear from you – let us know your thoughts in the comments section below.

Find out more about DSL by checking out their website, and follow them on Twitter.

Original post

10 Link-o-Troned in Data Science Week 14

1. Bixo: Open source web mining toolkit w/ Cascading pipes

2. Finding Group Structures in Data with Unsupervised ML

3. Markov Chain Monte Carlo without all the Bullshit

4. An advanced log file viewer. No server.  No setup

5. Exploring Spark MLlib: Transformation & Model Creation

6. Getting Started w/ Docker for the Node.js & CouchDB

7. Hands on: Graph Analytics with Spark GraphX

8. A mega collection of data science ebooks links

9. Cookbook: Open source analytics using Impala

10. AWS Machine Learning: Data-Driven Decisions at Scale

[bonus] Foodbot: AI for lunch with Slack


Interested in data science, machine intelligence, and big data?


Check out Data Machina, a free weekly digest

10 Link-o-Troned in Data Science Week 13

1. Faker: A library for generating fake data

2. Awesome Machine Learning: A Curated List [updated]

3. How to factorize a 700 GB matrix w/ Apache Flink

4. Synaptic: neural network library for node.js & the browser

5. Predicting Bike Sharing Demand w/ Gradient Boosted Trees

6. How to share data with a statistician

7.  GraphAware: A Neo4j Recommendation Engine

8. Getting started w/ the Kafka Elasticsearch River plugin

9. Breaking Linear Classifiers on ImageNet

10. Getting started with Spark & Cassandra

[bonus] An Automatic Computer Science Paper Generator


Interested in data science, machine intelligence, and big data?


Check out Data Machina, a free weekly digest

Data Science London © 2015