From video analytics proof of concept into partnership

Sanoma
  • Video analytics
  • Data Platform
  • Strategic partnership
Case

Case

In 2017 Sanoma requested a two week long Proof of Concept for their next video analytics solution.

Present day

Present day

Our initial collaboration led into a longer association and over time we have executed multiple projects to make Sanoma’s data pipelines better than ever.

Results

  • ScalablityDeveloping stable and scalable analytics platform
  • Data RefineryThe platform is using hand tuned and machine learning based methods for cleaning, classifying and aggregating the data
  • Data HubThe analytics platform takes datasets from multiple sources and enriches it to be used in analytics, recommendations, marketing, and planning

The First spark, still blazing

Collaboration with Sanoma and Emblica started when Nelonen was contemplating the next tool to fullfil the analytics needs their online video platform had. The first Proof Of Concept for analytics was built in two weeks, using an existing clickstream data source on top of AWS. The successful trial was followed by almost a year-long pilot, to build the foundations for an entirely new way to process, visualize, and understand analytics and customers in Sanoma.

Sanoma's online video platform offers live streams, podcasts, news, video clips, series, movies, radio, and audiobooks to a large audience on multiple sites. The same system powers media delivery to Ruutu and Supla websites and many different applications.

Long journey

Delivering a custom analytics platform has its challenges related to scale, data quality & accuracy, old devices, and security. Because of these challenges the enterprises often choose the easy way, and pick an off the shelf solution to solve their analytics challenges. In Sanoma’s case, the decision to develop an in-house solution was backed by their future development roadmap. A tailored tool would better accommodate Sanoma’s data, niche use cases, and existing telemetry implementations in legacy apps. After the initial deployment of the solution, the whole team began to realize the potential and possibilities of the in-house platform. Today Sanoma’s custom architecture has proven itself able to scale for new ideas and changes, while also meeting the requirements of information security and privacy.

Architecture

After the data, the most precious piece of Sanoma’s analytics system is its architecture. Instead of a gigantic monolith, the implemented micro-service architecture distributes all parts of the system as separate, well-defined pieces of the pipeline. There are different modules for taking in raw events, and aggregating them as sessions. Another component processes a chunk of events, and produces a single session with metrics. This way of structuring makes development and deployment easier, as we can focus on a certain component without a fear of breaking the whole pipeline. On top of that, multiple pipelines can be run simultaneously using the same source data. And of course, these results can be compared.

Stack

The system is deployed on AWS and each of the data pipeline components is running in containers on Kubernetes. Together with monitoring, telemetry and automation, our system requires minimal amount of attention even in high scale situations and traffic spikes.
All data is stored into an AWS S3-based data lake. Some of that data is indexed into a data warehouse, AWS Redshift, which is the main data storage for analytical queries. Additionally, scheduled aggregates are built and served into dashboards through a lakeshore mart, PostgreSQL.

Summary

What we did
  • Software development
  • Machine learning
  • Data processing
  • Data storage
  • Dashboards
  • Security
  • Infrastructure
  • DevOps
  • Service design
  • UX/UI
Cloud platform
  • Amazon AWS
  • Kubernetes
  • AWS Lambda
Database
  • Redshift
  • Athena
  • PostgreSQL
  • AWS Kinesis
  • AWS DynamoDB
  • Redis
  • RocksDB
Data science
  • Apache Spark
  • Tensorflow
  • Keras
Language & Tooling
  • Python
  • Rust
  • Haskell
  • Scala
  • Javascript

Ask for more

Teemu Heikkilä

Data Engineerteemu@emblica.fi
Teemu Heikkilä

Juhana Laurinharju

Data Engineerjuhana@emblica.fi
Juhana Laurinharju
Do you have a probem you would like us to create a solution for? Contact us and lets talk more!

See more projects