
Data Preparation and Analysis
Dr. Pooja Sharma
This audiobook is narrated by a digital voice.
DESCRIPTION
Data science is an evolving field, and the ability to effectively prepare and analyze data is a critical skill for any aspiring professional. This book serves as a comprehensive introduction to the foundational concepts and tools of data science, making it ideal for beginners and aspiring data professionals.
This book provides a structured and comprehensive learning path, beginning with a broad introduction to data science, its applications, and fundamental analysis methods. You will then explore the core Python libraries for data manipulation, NumPy for efficient numerical operations, and Pandas for powerful data structuring and transformation. The book dedicates significant focus to real-world data challenges, walking you through the crucial steps of data gathering, preparation, and cleaning; addressing issues like scalability, missing data, and inconsistencies.
The book concludes with three real-world projects that apply the concepts in practical settings, making you proficient in the entire end-to-end data preparation and analysis pipeline. You will have a solid command of essential tools and techniques, empowering you to confidently tackle and derive meaningful insights from diverse datasets in any professional setting.
WHAT YOU WILL LEARN
● Implement ML models using NumPy, Pandas, Matplotlib, or scikit-learn.
● Gain a solid foundation in data science, principles, algorithms, and methodologies.
● Learn to frame real-world problems as ML tasks.
● Implement data cleaning for consistency and missing data.
● Conduct exploratory data analysis with descriptive statistics.
● Uncover data patterns using clustering and association techniques.
● Design and create effective time series visualizations.
● Build interactive visualizations to explore data.
● Apply an end-to-end data workflow in practical projects.
Duration - 8h 59m.
Author - Dr. Pooja Sharma.
Narrator - Digital Voice Madison G.
Published Date - Wednesday, 08 January 2025.
Copyright - © 2026 BPB ©.
Location:
United States
Networks:
Dr. Pooja Sharma
Digital Voice Madison G
BPB Publications
English Audiobooks
Findaway Audiobooks
Description:
This audiobook is narrated by a digital voice. DESCRIPTION Data science is an evolving field, and the ability to effectively prepare and analyze data is a critical skill for any aspiring professional. This book serves as a comprehensive introduction to the foundational concepts and tools of data science, making it ideal for beginners and aspiring data professionals. This book provides a structured and comprehensive learning path, beginning with a broad introduction to data science, its applications, and fundamental analysis methods. You will then explore the core Python libraries for data manipulation, NumPy for efficient numerical operations, and Pandas for powerful data structuring and transformation. The book dedicates significant focus to real-world data challenges, walking you through the crucial steps of data gathering, preparation, and cleaning; addressing issues like scalability, missing data, and inconsistencies. The book concludes with three real-world projects that apply the concepts in practical settings, making you proficient in the entire end-to-end data preparation and analysis pipeline. You will have a solid command of essential tools and techniques, empowering you to confidently tackle and derive meaningful insights from diverse datasets in any professional setting. WHAT YOU WILL LEARN ● Implement ML models using NumPy, Pandas, Matplotlib, or scikit-learn. ● Gain a solid foundation in data science, principles, algorithms, and methodologies. ● Learn to frame real-world problems as ML tasks. ● Implement data cleaning for consistency and missing data. ● Conduct exploratory data analysis with descriptive statistics. ● Uncover data patterns using clustering and association techniques. ● Design and create effective time series visualizations. ● Build interactive visualizations to explore data. ● Apply an end-to-end data workflow in practical projects. Duration - 8h 59m. Author - Dr. Pooja Sharma. Narrator - Digital Voice Madison G. Published Date - Wednesday, 08 January 2025. Copyright - © 2026 BPB ©.
Language:
English
Title Page
Duration:00:00:13
Copyright Page
Duration:00:01:21
Dedication Page
Duration:00:00:06
About the Author
Duration:00:02:09
About the Reviewer
Duration:00:01:07
Acknowledgement
Duration:00:01:06
Preface
Duration:00:07:56
Table of Contents
Duration:00:09:51
1. Introduction to Data Science
Duration:00:00:04
Introduction
Duration:00:01:53
Structure
Duration:00:00:33
Objectives
Duration:00:00:42
Data science objectives
Duration:00:02:06
Evolution of data science
Duration:00:04:56
Role of data science in various domains
Duration:00:05:42
Stages of a data science project
Duration:00:05:25
Data security issues
Duration:00:05:38
Data science vs. data analytics vs. machine learning vs. artificial intelligence
Duration:00:03:10
Career in data science
Duration:00:05:40
Steps to install Anaconda and Python
Duration:00:03:08
Conclusion
Duration:00:01:02
Multiple choice questions
Duration:00:06:26
Answers
Duration:00:00:53
Questions
Duration:00:02:01
2. NumPy
Duration:00:00:03
Introduction to NumPy
Duration:00:01:55
Importance of NumPy in data science
Duration:00:00:39
NumPy basics
Duration:00:00:32
Creating NumPy arrays
Duration:00:00:19
From Python lists and tuples
Duration:00:06:27
Creating arrays with specific values
Duration:00:01:58
Creating arrays with a range of values
Duration:00:00:41
Creating random arrays
Duration:00:00:43
Creating empty and uninitialized arrays
Duration:00:00:49
Creating arrays with patterns
Duration:00:07:59
NumPy array attributes
Duration:00:03:25
NumPy array operations
Duration:00:00:31
Arithmetic operations
Duration:00:02:39
Mathematical functions
Duration:00:03:06
Aggregation functions
Duration:00:02:00
Array manipulation functions
Duration:00:02:56
NaN values
Duration:00:01:09
Logical operations
Duration:00:02:03
Sorting and searching
Duration:00:01:49
Linear algebra operations
Duration:00:01:57
Universal functions
Duration:00:00:30
Key features of ufuncs
Duration:00:02:55
Advantages of using ufuncs
Duration:00:00:35
Vectorized operations
Duration:00:01:59
Broadcasting
Duration:00:00:18
Rules of broadcasting
Duration:00:01:47
Indexing, slicing, and iterating NumPy arrays
Duration:00:00:23
Indexing
Duration:00:00:56
Slicing
Duration:00:01:12
Iterating
Duration:00:01:46
Boolean indexing and conditional filtering in NumPy
Duration:00:00:19
Boolean indexing
Duration:00:01:20
Conditional filtering
Duration:00:01:54
Example with 2D arrays
Duration:00:00:53
Fancy indexing in NumPy
Duration:00:00:20
Using integer arrays for indexing
Duration:00:02:19
Advanced indexing techniques
Duration:00:03:19
Reshaping arrays in NumPy
Duration:00:00:12
Changing dimensions
Duration:00:01:56
Adding and removing dimensions
Duration:00:01:38
Combining and splitting NumPy arrays
Duration:00:03:22
Random numbers and simulations in NumPy
Duration:00:00:17
Generating random numbers
Duration:00:04:45
Input and output in NumPy
Duration:00:04:49
Binary data handling
Duration:00:02:30
Performance and optimization in NumPy
Duration:00:01:28
Programming exercises
Duration:00:01:57
3. Pandas
Duration:00:00:03
Key features of Pandas
Duration:00:03:04
Benefits of Pandas
Duration:00:01:33
Pandas basics
Duration:00:02:59
Series
Duration:00:00:17
Key features of a series
Duration:00:06:21
DataFrame
Duration:00:00:36
Creating a DataFrame
Duration:00:01:37
Common operations on DataFrames
Duration:00:02:54
Combining multiple DataFrames
Duration:00:03:30
Reshaping DataFrames
Duration:00:01:07
4. Data Collection and Data Preprocessing
Duration:00:00:05
Types of data
Duration:00:04:34
Structured vs. unstructured data
Duration:00:00:16
Structured data
Duration:00:01:37
Unstructured data
Duration:00:01:40
Key differences between structured and unstructured data
Duration:00:00:58
Data collection
Duration:00:07:30
Datasets
Duration:00:00:24
Based on source and availability
Duration:00:01:16
Based on data type
Duration:00:01:08
Based on domain
Duration:00:02:02
Based on machine learning task
Duration:00:01:10
Real-world examples by domain
Duration:00:00:17
Data formats
Duration:00:03:33
Benefits of data format types
Duration:00:01:18
Data parsing
Duration:00:01:48
Types of data parsers
Duration:00:05:36
Types of data parsing
Duration:00:01:48
Data parser use cases
Duration:00:01:55
Data transformation
Duration:00:00:21
Steps in data transformation
Duration:00:02:19
Example of data transformation
Duration:00:04:16
Real-time issues in data transformation
Duration:00:02:36
5. Data Cleaning
Duration:00:00:03
Data consistency
Duration:00:01:18
Causes of data inconsistency
Duration:00:00:41
Importance of data consistency
Duration:00:00:37
Methods to ensure data consistency
Duration:00:01:46
Data consistency issues
Duration:00:03:10
Heterogeneous data
Duration:00:01:19
Challenges of heterogeneous data
Duration:00:00:37
How to handle heterogeneous data
Duration:00:05:32
Missing data
Duration:00:00:12
Types of missing data
Duration:00:00:47
Causes of missing data
Duration:00:00:28
Handling missing data
Duration:00:05:07
Types of data transformation
Duration:00:02:55
Data segmentation
Duration:00:00:39
Types of data segmentation
Duration:00:05:10
Data transformation vs. data segmentation
Duration:00:00:18
6. Exploratory Data Analysis
Duration:00:00:05
Descriptive statistics
Duration:00:02:00
Measures of central tendency
Duration:00:01:24
Measures of dispersion
Duration:00:01:27
Statistical tools
Duration:00:06:25
Use of descriptive statistics
Duration:00:02:25
Comparative statistics
Duration:00:02:31
Role of t-test
Duration:00:21:53
Use of comparative statistics
Duration:00:02:51
Descriptive statistics vs. comparative statistics
Duration:00:00:19
Clustering
Duration:00:01:29
K-means clustering
Duration:00:00:26
Example
Duration:00:03:05
Hierarchical clustering
Duration:00:05:51
Density-based clustering
Duration:00:05:51
Uses of clustering
Duration:00:03:21
Association
Duration:00:01:26
Apriori algorithm
Duration:00:07:17
Frequent Pattern Growth
Duration:00:00:30
FP Growth algorithm steps
Duration:00:04:51
Uses of association rule mining
Duration:00:03:17
Clustering vs. association
Duration:00:00:15
Hypothesis generation
Duration:00:00:37
Steps in hypothesis generation
Duration:00:02:32
Examples of hypotheses
Duration:00:03:23
7. Data Visualization
Duration:00:00:04
Principles of data visualization
Duration:00:02:55
Types of data visualization
Duration:00:00:22
Basic charts and graphs
Duration:00:01:30
Advanced visualizations
Duration:00:01:26
Interactive dashboards
Duration:00:01:21
Tools for data visualization
Duration:00:00:32
Python libraries for visualizations
Duration:00:01:42
Business intelligence tools
Duration:00:00:58
Geospatial tools
Duration:00:01:11
Web-based visualization frameworks
Duration:00:01:01
Feature selection
Duration:00:04:09
Time series analysis
Duration:00:00:39
Key characteristics of time series data
Duration:00:01:01
Components of time series
Duration:00:00:56
Steps in time series analysis
Duration:00:08:16
Applications of time series analysis
Duration:00:00:39
Geolocated data analysis
Duration:00:00:41
Key concepts in geolocated analysis
Duration:00:00:59
Tools for geolocated analysis
Duration:00:00:46
Steps in performing geolocated analysis
Duration:00:03:00
Applications of geolocated analysis
Duration:00:01:02
Correlations and connections
Duration:00:00:53
Importance of correlation in data analysis
Duration:00:00:35
Visualization of correlations
Duration:00:03:49
Connections
Duration:00:01:21
Networks and hierarchies
Duration:00:00:36
Key features of networks
Duration:00:00:32
Tools for network analysis
Duration:00:01:12
Hierarchies
Duration:00:00:44
Key features of hierarchies
Duration:00:01:43
Interactivity
Duration:00:00:28
Key aspects of interactivity
Duration:00:01:06
Tools for interactive data science
Duration:00:00:34
Benefits of interactivity
Duration:00:00:56
8. Projects
Duration:00:00:03
Movie recommender system
Duration:00:01:59
MovieLens dataset
Duration:00:12:35
Customer support chatbot
Duration:00:09:44
Customer segmentation system
Duration:00:08:37