Big Data Training & Coaching - datacrunchers.eu

Training

To empower you developers and data analysts we are providing a set of courses, ranging from beginner to intermediate to advanced topics.

Following courses are available:

Using AI and Data Science for Business (1/2 day)
Apache Spark 2 Hands-on Workshop (2 days)
The Hadoop Ecosystem (2 days)

Together we define a training Curriculum to enable your developers and data analysts. On demand, we also provide courses about specific topics not covered by our courses.

Coaching helps smoothing the adoption of Big Data technologies by your developers and data analysts. DataCrunchers can assist your employees either on site or remotely. Get assisted on your first project or on more advanced tasks later on.

Using AI and Data Science for business

This seminar brings you up to speed in the state-of-the-art in artificial intelligence, and offers you a guided tour through the fascinating world of automation, (chat)bots, data mining, neural networks, (un)supervised learning, machine learning, deep learning and data science.

AI and its subdomain Data Science are in the news almost every day: from how “sexy” the job of a data scientist is to the “infinite” possibilities of Artificial Intelligence.

Beneath all the hype that surrounds artificial intelligence (AI), automation and data science, there are real breakthroughs in AI happening at this moment, and they are transforming the way we do business. AI developers are creating software that doesn’t just do what is programmed for, but is able to anticipate the needs of customers and users through a combination of pattern recognition, knowledge mining, planning and reasoning.

This seminar will explain how Artificial Intelligence evolved throughout its “winters” into the narrow AI we all use in our daily life. Some studies predict that AI will change our jobs and our economy in a huge way. Even today, companies can already plug into AI from the cloud to start enhancing your employees.

Data Science is a subdomain of Artificial Intelligence which is already widely adopted by large organisations. It helps those organizations to define the next best offer to their customers, predict those customers with high probability in churning, segmenting customers into segments etc. We will give an overview what it means to be a data scientist, how data science can be used for business and how to start adopting data science within your organisation.

This seminar is ideally organized as afternoon seminar (13 - 17h30)

Audience

This afternoon seminar is aimed at business leaders and architects who want to understand what AI and data science is all about, what the benefits are of using both, and how to implement this within their organisation.

This seminar answers these and many other questions:

What is happening today ?
What is AI, data science, machine learning, deep learning ?
How to adopt AI in my organisation?
What is a data scientist, and how do you select and hire one ?
How to organise a data team?
How do you start a project, and prototype a solution ?
How to improve sales revenue using data science?
How to integrate Big Data, data science and AI?
What are big companies like Amazon, Facebook, Google, IBM, Microsoft, … offering ?
What is the open source AI offering ?
What does the future look like ?
What is the impact of AI on jobs and economy?

Agenda

AI for Business
- AI in the news
- What is AI?
- A brief history of AI
- Current state of AI
- You are already using AI
- What are the unicorns doing with AI (Amazon, Facebook, Google, Tesla, Uber, …)?
- AI use cases per industry
- AI impact
- Using AI from the Cloud
- Do It Yourself AI
Data Science for Business
- What is Data Science?
- Data Science versus AI
- The Data Mining process
- Supervised and unsupervised learning
- Algorithms classified
- The Toolbox of a data scientist
Using Big Data, Data science and AI for business
- The Big Data Lifecycle
- Lessons learned
- The Data Team
- Setting up a Data Science and AI competence center
- The AHEAD model when implementing AI
- Case Studies
- Trendwatching

Contact me regarding prices for groups, schedule etc at geert@datacrunchers.eu

I am giving this course throughout Europe.

Apache Spark

Big Data is the hype of the moment in ICT and marketing. Since its inception in 2007, Apache Hadoop has been looked at as the de facto standard for the storage and processing of big data volumes in batch.

But every technology has its limitations, and this is no different for Hadoop: it is batch-oriented and the MapReduce framework is too limited for handling all types of data analysis within the same technology stack.

Apache Spark makes big data easy to implement, it was developed in 2009 at the AMPLab (Algorithms, Machines, and People Lab) of the University of California in Berkeley, and donated to the open source community in 2010. It is faster than Hadoop, in some cases 100 times faster, and it offers a framework that supports different types of data analysis within the same technology stack: fast interactive queries, streaming analysis, graph analysis and machine learning. During this two-day hands-on workshop, we discuss the theory and practice of several data analysis applications.

Notebook technology (Zeppelin, Jupyter, Spark Notebook, Databricks Cloud, …) allows you to go from prototypes into production workflows in one go. Notebooks allow to implement “repeatable research” by mixing executable code with comments, images, tables, links, …

We’ve choosen Databricks Cloud as notebook technology because it is the most mature enterprise-ready notebook technology on the market at this moment. It’s implemented on top of AWS and apparently Azure support is on the roadmap as well.

This course supports Spark 2.x

Agenda

Day 1: Spark Basics & RDDs

What is Apache Spark?
Notebooks are coming
Just enough Scala
Spark Basics
- Spark 2.x
- A tale of 3 APIs
- Spark Shell
- SparkContext
- Spark Master
- RDD
- Transformations & Actions
- Caching
- Shared Variables: Accumulators & Broadcast variables
- Spark Applications
- Spark Execution Model

Day 2: Spark SQL, DataFrames & Datasets

Spark SQL, DataFrames & Datasets
- Introduction: RDDs vs DataFrames vs Datasets
- Basic DataFrame Operations
- Different Types of Data
- Aggregations
- Joins
- Data Sources
- SQL
- Datasets
Q&A

Slide examples are made available in Databricks Cloud notebooks, as such students can execute the samples while following the course.

All exercises are performed using Databricks Cloud, the solution notebooks are given at the end of the course.

Contact me regarding prices for groups, schedule etc at geert@datacrunchers.eu

I am giving this course throughout Europe.

The Hadoop Ecosystem

The rise of the internet, social media and mobile technologies and in the very near future the Internet of Things ensures that our data footprint is rising fast. Companies like Google and Facebook were quickly confronted with massive data sets, this led to a new way of thinking about data. Hadoop provides an open source solution based on the same technology used within Google. It allows you to store and analyze in a scalable way huge amounts of data to create new insights.

With this workshop we want to give everyone the opportunity to get acquainted with the Hadoop Ecosystem.

This course can be booked with exercises (2 days) or without (1 day)

Agenda

Introduction
What is Big Data?
- Volume, Variety, Velocity
- Business Drivers
- Technical Drivers
- Big Data Evolution
The Hadoop Ecosystem
- What is Hadoop?
- Hadoop Services
- Hadoop Distributions
Storage
- HDFS
- HBASE
- Kudu
- Data Modelling
Processing
- MapReduce
- Hive
- Pig
- Spark
- Yarn
Integration
- Sqoop
- Flume
- Kafka
Indexing
- ElasticSearch
Big Data Architectures
- Architectures
  - Lambda
  - Kappa
  - Zeta
- Trends