Consultancy
Discover & Define a Big Data Strategy
So you have heard the buzz about big data but are left with unanswered questions:
- Why all the buzz about big data?
- What can big data technologies do for my organization?
- Does my organization need big data?
- How to implement a big data project?
- How to identify possible business use cases for big data?
- What appliances exist and how do they compare?
- What should my strategy be all about?
- …
We can help you answer these questions through our discovery to strategy workshops, which is composed of:
- Explaining Big Data through business use cases.
- Discover how your organization stores and processes data, the business needs regarding insight and how big data is used in your industry.
- Define your Big Data strategy through workshops and how to achieve implementation of this strategy.
Data Mining & Machine Learning
We hear a lot these days about unlocking the value in data. But value doesn’t come from data or computers. Value comes from people - the analysts and decision-makers who extract insight and improve processes.
To extract valuable insights from your data we apply following techniques:
- Data Warehousing
- Using Apache Hive (SQL like)
- DataFlow Programming Language
- Using Apache Pig
- Machine Learning Using Apache Mahout
- Recommendations = user info + community info (behaviors)
- Clustering = form groups of similar data (common characteristics)
- Classification = Known data + New data (into existing categories)
- Predictive Modelling
- Visualization
- Semantic Analyis
Cluster Management
It takes different skills to setup, configure and maintain Hadoop and Storm clusters (Linux expertise, bash scripting, Java, …)
We have build up a lot of experience with Hadoop and Storm clusters. We can help get more productive to setup your own cluster, we support following Open Source and commercial distributions: Cloudera, Horton Data Platform, LIly, IBM InfoSphere, Oracle Big Data Appliance.
Next to setting up clusters locally we have experience in setting up clusters in following cloud settings: Amazon EC2, Amazon EMR, Hetzner, …
We empower system operators in using Hadoop and Storm clusters by training and coaching on location.
Building Solutions
Big Data solutions allow you to implement business use cases that were previously not implementable with the available tools:
- Download content from the internet, parse it and match it with existing customer data
- Replace non scalable overloaded related database systems with enterprise databases of the future
- End data silos in your organization by letting all information flow into an enterprise content repository
- Process huge amounts of data in real-time
- Combine batch Hadoop with real-time processing (so called lambda architecture)
- …
For each of the business cases you need to define an architecture that will stand over time. We can assist you in defining your requirements and designing an architecture that will meet those requirements.
Application & Systems Monitoring
Your applications generate log files that contain valuable information about availability, performance and capacity. Collecting and centralizing high volumes of log data from multiple sources for search and analysis can be challenging.
We can help you analyzing these log files in realtime to:
- Monitor for specific events and errors
- Trigger alerts when SLA’s are no longer respected
- See what your application is doing during development
- Catch exceptions and track execution flow ( Graph and report on the number of errors generated
- Improve user experience
- Understand user behavior and experience
- Track site traffic and capacity
- Measure application performance
Data Repository
In most organizations data is stored in multiple data silos, e.g. a bank client has
- its core business running on a mainframe system (unisys, zOS, …)
- web applications using an oracle database to store information
- business intelligence running on oracle data ware house
- accountancy applications running on an IBM AS400
Sounds familiar right? How can you mine this data spread out over these data silos? An Enterprise Content Repository is the answer which we define as:
- a scalable content repository
- data from the silos flow into the content repository
- flexible data metamodelling to store any kind of entity relationship diagram
- data is stored and indexed to be made searchable
- data is analyzed on the fly (e.g. machine learning: clustering, classification, recommendation engine, …)
- data and insights are published through an api
We can help you in designing and implementing a content repository that meets your requirements.