Pandas Functions Cheat Sheet



Translating Common Excel Functions to Python Pandas Methods.

This is my cheat sheet for commonly used Excel functions roughly translated to their Python Pandas equivalents.

Over the last 3 years I have increasingly found myself booting up a Jupyter Notebook to perform data analysis using Python Pandas rather than relaying on Excel. A few factors play into that:

  • speed when performing data analysis tasks: what one loses in GUI ability one can often make up for in speed of executing extractions, transformations, filters, calculations, etc.
  • repeatability: one can easily repeat past extraction, transformation, and analysis tasks
  • ability to handle large data sets: the reality of big data can sometimes mean Excel is not able to handle efficiently without slowing one’s system a halt
  1. Pandas Series and DataFrames are designed for fast data analysis and manipulation, as well as being flexible and easy to use. Below are just a few features that makes Pandas an excellent package for data analysis: Allows the use of labels for rows and columns.
  2. Pandas Cheat Sheet Pandas can be used as the most important Python package for Data Science. It helps to provide a lot of functions that deal with the data in easier way. It's fast, flexible, and expressive data structures are designed to make real-world data analysis.

This cheat sheet will be updated as I continue to advance my Pandas tool belt.

It will assume some baseline familiarity with Pandas.

This is not a comprehensive list but contains the functions I use most, an example, and my insights as to when it’s most useful. Contents: 1) S e tup 2) Importing 3) Exporting 4) Viewing and Inspecting 5) Selecting 6) Adding / Dropping 7) Combining 8) Filtering 9) Sorting 10) Aggregating 11) Cleaning 12) Other 13) Conclusion. For a quick overview of pandas functionality, see 10 Minutes to pandas. You can also reference the pandas cheat sheet for a succinct guide for manipulating data with pandas. The community produces a wide variety of tutorials available online. Some of the material is enlisted in the community contributed Community tutorials. Pandas, Numpy, Python Cheatsheet Python notebook using data from Kernel Files 21,941 views 1y ago.

The breadcrumbs here will allow the reader to Google the method required to replace an Excel functionality more efficiently.

Sorting Rows by Column Values [Excel: Control + Shift + L]

df = df.sort_values(['Col1_Name', 'Col2_Name', 'Col3_Name'], ascending=[False, False, True])

This function will sort the rows of a DataFrame on the Col1_Name, then Col2_Name, and finally on Col3_Name. The option ascending argument as False will tell Pandas to sort the column values respective to their list index positions largest (top rows) to smallest (bottom rows). For example, Col1_name will be sorted to have the values in Ascending=False (descending) order, Col2_name will be sorted to have the values in Ascending=False (descending) order, and Col3_name will be sorted to have the values in Ascending=True (ascending) order.

Delete Column [Excel: Control + -]

df.drop(columns=['column_to_delete_name','second_col_to_del_name], inplace=True)

VLOOKUP

df.merge(df_2, on='key_column_name', how='left)

–> where key_column_name is the column that contains keys or lookup values and df_2 is table array equivalent

–> Typically, my df2 is a three column df consisting of index, key_column, and a column with values to return as if it were a vlookup. You can also merge all and drop unneeded columns or filter with making the merged call

new_df = df.merge(df_2, on='key_column_name', how='left)['key_column_name, 'column_X', 'column_y']

SUMIFS

COUNTIFS

Count

df['col_name'].value_counts()

or

Pandas Functions Cheat Sheet

len(df)

Pivot Tables

Sheet

CONCAT, LEFT, and RIGHT (for string values)

Rename a Column

IF statements and conditional formulas

Pandas Functions Cheat Sheet Printable

Applying logic to a column (i.e. if column 0 or 1, return True/False), first create the function –

Now apply to the specified column–

Applying logic to a row (i.e. add three columns together, return result), first create the function–

Now apply to the df for all rows – # if we want to save the returned values, df[‘new_var] will store those returned values as a new column in our df # axis is “columns” because we are moving across the columns left to right, this is kinda strange at first

Append

Graphs

Vlookup with two variables

Check if date is the last day of a month

Pandas Data Science Cheat Sheet

Thank you Stack Overflow

Check if last business day of month

Python Pandas Cheat Sheet Pdf

Python

Pandas Dataframe Cheat Sheet

Thank you Stack Overflow





Comments are closed.