What You'll Learn

What you will take away from this comprehensive project to learn advanced Python concepts

  • Build a fully-functioning library similar to pandas that you can use to do data analysis

  • Complete a large, comprehensive project

  • Apply test-driven development with pytest

  • Learn advanced Python topics such as special methods and property decorators

Course Description

Build a Data a Data Analysis Library from Scratch in Python is targeted to those that have a desire to immerse themselves in a single, long, and comprehensive project that covers several advanced Python concepts. By the end of the project you will have built a fully-functioning Python library that is able to complete many common data analysis tasks. The library will be titled Pandas Cub and have similar functionality to the popular pandas library.

This course focuses on developing software within the massive ecosystem of tools available in Python. There are 40 detailed steps that you must complete in order to finish the project. During each step, you will be tasked with writing some code that adds functionality to the library. In order to complete each step, you must pass the unit-tests that have already been written. Once you pass all the unit tests, the project is complete. The nearly 100 unit tests give you immediate feedback on whether or not your code completes the steps correctly.

There are many important concepts that you will learn while building Pandas Cub.

  • Creating a development environment with conda

  • Using test-driven development to ensure code quality

  • Using the Python data model to allow your objects to work seamlessly with builtin Python functions and operators

  • Build a DataFrame class with the following functionality:

    • Select subsets of data with the brackets operator

    • Aggregation methods - sum, min, max, mean, median, etc...

    • Non-aggregation methods such as isna, unique, rename, drop

    • Group by one or two columns to create pivot tables

    • Specific methods for handling string columns

    • Read in data from a comma-separated value file

    • A nicely formatted display of the DataFrame in the notebook

It has been my experience that many people will learn just enough of a programming language like Python to complete basic tasks, but will not possess the skills to complete larger projects or build entire libraries are built. This course intends to provide a means for students looking for a challenging and exciting project that will take serious effort and a long time to complete. 

This course is taught by expert instructor Ted Petrou, author of Pandas Cookbook, Master Data Analysis with Python, and Exercise Python.

Course Curriculum

  • 1
    Project Genesis
  • 2
    Environment Setup
    • 04 Opening the Project in VS Code
    • 05 Setting up the Development Environment
    • 06 Test-Driven Development
    • A note before the next video
    • 07 Installing an IPython Kernel for Jupyter
  • 3
    Getting Ready to Code
    • 08 Inspecting the __init__ File
    • 09 Importing Pandas Cub
    • 10 Manually Test in a Jupyter Notebook
    • 11 Getting Ready to Start
  • 4
    DataFrame Construction
    • 12 Check DataFrame Input Types
    • 13 Check Array Lengths
    • 14 Convert Unicode Arrays to Object
  • 5
    Basic Properties and Visual Representation
    • 15 Implementing the __len__ special method
    • 16 Return Columns as a List
    • 17 Set New Column Names
    • 18 The Shape Property
    • 19 Visual Notebook Representation
    • 20 The values Property
    • 21 The dtypes Property
  • 6
    Subset Selection
    • 22 Select a Single Column
    • 23 Select Multiple Columns
    • 24 Boolean Selection
    • 25 Check for Simultaneous Selection
    • 26 Select a Single Cell
    • 27 Select Rows as Booleans, Lists, or Slices
    • 28 Multiple Column Simultaneous Selection
    • 29 Column Slices
    • 30 Tab Completion for Columns
    • 31 Create a New Column
  • 7
    Basic Methods
    • 32 head and tail methods
    • 33 Generic Aggregation Methods
    • 34 The isna Method
    • 35 The count method
    • 36 The unique Method
    • 37 The nunique Method
  • 8
    Value Counts
    • 38 The value_counts Method
    • 39 Normalize value_counts
  • 9
    Other Methods and Operators
    • 40 The rename Method
    • 41 The drop Method
    • 42 Non-Aggregation Methods
    • 43 The diff Method
    • 44 The pct_change Method
    • 45 Arithmetic and Comparison Operators
    • 46 The sort_values Method
    • 47 The sample Method
  • 10
    Pivot Tables
    • 48 Pivot Tables Part 1
    • 49 Pivot Tables Part 2
    • 50 Pivot Tables Part 3
    • 51 Pivot Tables Part 4
    • 52 Pivot Tables Part 5
  • 11
    Documentation, Strings, and Reading CSVs
    • 53 Automatically Add Documentation
    • 54 String-only Methods
    • 55 The read_csv Function Part 1
    • 56 Thre read_csv Function Part 2
  • 12
    • 57 Conclusion

Build a Data Analysis Library from Scratch in Python

Buy $10.00