Consultancy - datacrunchers.eu

Discover & Define a Big Data Strategy

So you have heard the buzz about big data but are left with unanswered questions:

Why all the buzz about big data?
What can big data technologies do for my organization?
Does my organization need big data?
How to implement a big data project?
How to identify possible business use cases for big data?
What appliances exist and how do they compare?
What should my strategy be all about?
…

We can help you answer these questions through our discovery to strategy workshops, which is composed of:

Explaining Big Data through business use cases.
Discover how your organization stores and processes data, the business needs regarding insight and how big data is used in your industry.
Define your Big Data strategy through workshops and how to achieve implementation of this strategy.

Data Mining & Machine Learning

We hear a lot these days about unlocking the value in data. But value doesn’t come from data or computers. Value comes from people - the analysts and decision-makers who extract insight and improve processes.

To extract valuable insights from your data we apply following techniques:

Data Warehousing
- Using Apache Hive (SQL like)
DataFlow Programming Language
- Using Apache Pig
Machine Learning Using Apache Mahout
- Recommendations = user info + community info (behaviors)
- Clustering = form groups of similar data (common characteristics)
- Classification = Known data + New data (into existing categories)
Predictive Modelling
Visualization
Semantic Analyis

Cluster Management

It takes different skills to setup, configure and maintain Hadoop and Storm clusters (Linux expertise, bash scripting, Java, …)

We have build up a lot of experience with Hadoop and Storm clusters. We can help get more productive to setup your own cluster, we support following Open Source and commercial distributions: Cloudera, Horton Data Platform, LIly, IBM InfoSphere, Oracle Big Data Appliance.

Next to setting up clusters locally we have experience in setting up clusters in following cloud settings: Amazon EC2, Amazon EMR, Hetzner, …

We empower system operators in using Hadoop and Storm clusters by training and coaching on location.

Building Solutions

Big Data solutions allow you to implement business use cases that were previously not implementable with the available tools:

Download content from the internet, parse it and match it with existing customer data
Replace non scalable overloaded related database systems with enterprise databases of the future
End data silos in your organization by letting all information flow into an enterprise content repository
Process huge amounts of data in real-time
Combine batch Hadoop with real-time processing (so called lambda architecture)
…

For each of the business cases you need to define an architecture that will stand over time. We can assist you in defining your requirements and designing an architecture that will meet those requirements.

Search

Application & Systems Monitoring

Your applications generate log files that contain valuable information about availability, performance and capacity. Collecting and centralizing high volumes of log data from multiple sources for search and analysis can be challenging.

We can help you analyzing these log files in realtime to:

Monitor for specific events and errors
Trigger alerts when SLA’s are no longer respected
See what your application is doing during development
Catch exceptions and track execution flow ( Graph and report on the number of errors generated
Improve user experience
Understand user behavior and experience
Track site traffic and capacity
Measure application performance

Data Repository

In most organizations data is stored in multiple data silos, e.g. a bank client has

its core business running on a mainframe system (unisys, zOS, …)
web applications using an oracle database to store information
business intelligence running on oracle data ware house
accountancy applications running on an IBM AS400

Sounds familiar right? How can you mine this data spread out over these data silos? An Enterprise Content Repository is the answer which we define as:

a scalable content repository
data from the silos flow into the content repository
flexible data metamodelling to store any kind of entity relationship diagram
data is stored and indexed to be made searchable
data is analyzed on the fly (e.g. machine learning: clustering, classification, recommendation engine, …)
data and insights are published through an api

We can help you in designing and implementing a content repository that meets your requirements.