This course provides an in-depth overview of essential and advanced topic areas pertaining to data science and analysis techniques relevant and unique to Big Data with an emphasis on how analysis and analytics need to be carried out individually and collectively in support of the distinct characteristics, requirements and challenges associated with Big Data datasets.
The following primary topics are covered:
– Exploratory Data Analysis, Essential Statistics, including Variable Categories and Relevant Mathematics
– Statistics Analysis, including Descriptive, Inferential, Covariance, Hypothesis Testing, etc.
– Measures of Variation or Dispersion, Interquartile Range & Outliers, Z-Score, etc.
– Probability, Frequency, Statistical Estimators, Confidence Interval, etc.
– Variables and Basic Mathematical Notations, Statistical Measures and Statistical Inference
– Confirmatory Data Analysis (CDA)
– Data Discretization, Binning and Clustering
– Visualization Techniques, including Bar Graph, Line Graph, Histogram, Frequency Polygons, etc.
– Prediction Linear Regression, Mean Squared Error and Coefficient of Determination R2, etc.
– Numerical Summaries, Modeling, Model Evaluation, Model Fitting and Model Overfitting
– Statistical Models, Model Evaluation Measures
– Cross-Validation, Bias-Variance, Confusion Matrix and F-Score
– Association Rules and Apriori Algorithm
– Data Reduction, Dimensionality Feature Selection
– Feature Extraction, Data Discretization (Binning and Clustering)
– Parametric vs. Non-Parametric, Clustering vs. Non-Clustering
– Distance-Based, Supervised vs. Semi-Supervised
– Linear Regression and Logistic Regression for Big Data
– Logistics Regression, Naïve Bayes, Laplace Smoothing, etc.
– Decision Trees for Big Data
– Pattern Identification, Association Rules, Apriori Algorithm
– Time Series Analysis, Trend, Seasonality, K Nearest Neighbor (kNN), K-means
– Text Analytics for Big Data and Outlier Detection for Big Data
– Statistical, Distance-Based, Supervised and Semi-Supervised Techniques
Duration: 1 Day
Taking the Course at a Workshop
This course can be taken as part of instructor-led workshops taught by Arcitura Certified Trainers. These workshops can be open for public registration or delivered privately for a specific organization. Certified Trainers can teach workshops in-person at a specific location or virtually using a video-enabled remote system, such as WebEx.
Visit the Workshop Calendar page to view the current calendar of public workshops.
Visit the Private Training page to learn more about Arcitura’s worldwide private workshop delivery options.
Taking the Course via an eLearning Study Kit
This course can be completed via self-study by purchasing an eLearning study kit subscription, which includes online video lessons, as well as online and offline access to the electronic course materials and additional supplements and resources designed for self-paced study and exam preparation.
Visit the eLearning Study Kits page for more information about full-color printed study kits.
Visit the Digital Transformation online store for purchasing information.
Taking the Course via a Printed Study Kit
This course can be completed via self-study by purchasing a printed study kit, which includes the full-color course materials as well as additional supplements and resources designed specifically for self-paced study and exam preparation.
Visit the Printed Study Kits page for more information about full-color printed study kits.
Visit the Digital Transformation online store for purchasing information.
Certifications
This course is part of the following certification track(s):
– Digital Transformation Data Scientist