CSV Processing with Python and Pandas - Some “Operations”

A quick dump of Python & Pandas “operations” that come in handy for CSV processing




“Statement” Operations

These are used in stringing 0 or more “expressions” together a “statement” (code that does something).

I’m listing these first because there aren’t a lot of them that you’ll use, compared to “expression operations” (which are more like Excel functions): just “print,” “=,” and “import.”

Key:


print(°°°)

(official documentation link)

This shows the value of the input you typed on your screen, presuming you have your Python program running in a context that gives you a text-based “output console.”

Data type of input “thing:”
Some sort of expression whose output value it is meaningful to “print to screen”
(Don’t be afraid to try with all kinds of expressions, even if they don’t seem inherently “displayable” – some “data types” get pretty creative figuring out how to render themselves from inside a “print()” statement!)


°°°1°°° = °°°2°°°

This stores the value of the “expression” you typed as “input thing #2” as a “variable” going by the nickname you typed as “input thing #1.”
From then on in your code, instead of re-typing all the code that computed the value of °°°2°°°, you can just type the text you put as °°°1°°°.
This “nickname” becomes an expression in and of itself, a more concise way of re-typing °°°2°°°.
(That is, unless you do this again into the same nickname with a different expression. Then, of course, after that second use of “=” to the same nickname in your code, the other expression’s value will be used when you refer to the nickname!)

This “=” is known as an “assignment operator” in programming jargon and is really special. Because we already used “=” here, note that when we want to check whether it’s true or false that two values “equal” each other, we’ll have to use other snippets of code, because “=” has already been claimed by the grammar of writing Python code for this special use!

Data type of “input thing #1”:”
A word you want to reuse, later in your code, to reference the output value of the expression to the right of the “=”

Data type of input “thing #2:”
Some sort of expression


import(°°°)

(official documentation link)

This says that you want to use a Python “package,” like Pandas, when writing your code, and that you want to have access to “data types” and “operations” that normal-Python doesn’t know about.

Data type of input “thing:”
The name of a Python “package”



“Expression” Operations

These are used in stringing 0 or more “expressions” together into a new expression with a single output value.

Remember because expression operations combine things into one new single expression, you can “nest” expressions inside each other, just like in Excel.
Only don’t forget that in Python, you can “checkpoint” your nesting work when things start to get too wordy by using the “assignment operator” “=” and saving expressions into variable names, resuming your mega-expression-building on another line with that variable name.

Just as most of the richness of Excel is in its Formula Editor, most of the “operations” you’ll use to process CSV files by writing Python & Pandas code are operations used to form “expressions.”

Key:


type(°°°)

(official documentation link)

Number of input values:
1

Data type of output value:
Python “type”

Note about the output value:
A note telling you the “data type” of the expression within its parentheses

Example code:
type(5)

Output of example code (if it were inside a print() statement):
<class 'int'>


len(°°°)

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
plaintext, list, Pandas “DataFrame”, Pandas “Series”, etc. (But not numbers, etc.)

Data type of output value:
integer

Note about the output value:
The “length” of whatever is in the expression within its parentheses, in a way appropriate to the “data type” of that expression

Example code:
len('Hiya')

Output of example code (if it were inside a print() statement):
4


°°°.abs()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series,” only with negative numbers in the “Series” turned into positive ones. Will abort your program with an error if values in your Pandas “Series” are, say, text, instead of numbers.


°°°1°°° + °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: number. #2: number.

Data type of output value:
number

Note about the output value:
The value of expression #2 added to the value of expression #1. (Normal math!)

Example code:
3+4

Output of example code (if it were inside a print() statement):
7


°°°1°°° + °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: number. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the value of expression #2 added to every number in the “Series.” Will abort your program with an error if values in your Pandas “Series” are, say, text, instead of numbers.


°°°1°°° + °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: Pandas “Series” of the SAME LENGTH as expression #1.

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the “corresponding value” from the appropriate equivalent position within expression #2 added to every number in the first “Series.” Will abort your program with an error if values in either Pandas “Series” are, say, text, instead of numbers.


°°°1°°°.add(°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: number.

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the value of expression #2 added to every number in the “Series.” Will abort your program with an error if values in your Pandas “Series” are, say, text, instead of numbers.


°°°1°°° + °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series” full of numbers. #2: Pandas “Series” full of numbers; SAME LENGTH as the Series in expression #1.

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the “corresponding value” from the appropriate equivalent position within expression #2 added to every number in the first “Series.” Will abort your program with an error if values in either Pandas “Series” are, say, text, instead of numbers.


°°°1°°°.add(°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series” full of numbers. #2: Pandas “Series” full of numbers; SAME LENGTH as the Series in expression #1.

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the “corresponding value” from the appropriate equivalent position within expression #2 added to every number in the first “Series.” Will abort your program with an error if values in either Pandas “Series” are, say, text, instead of numbers.


°°°1°°° - °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
(see above in various “+” operations – same idea)

Data type of output value:
(see above in various “+” operations)

Note about the output value:
(Same idea as for “+,” only subtraction.)


°°°1°°°.subtract(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
(see above in various “.add()” operations – same idea)

Data type of output value:
(see above in various “.add()” operations)

Note about the output value:
(Same idea as for “.add(),” only subtraction.)


°°°1°°° * °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
(see above in various “+” operations – same idea)

Data type of output value:
(see above in various “+” operations)

Note about the output value:
(Same idea as for “+,” only multiplication.)


°°°1°°°.multiply(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
(see above in various “.add()” operations – same idea)

Data type of output value:
(see above in various “.add()” operations)

Note about the output value:
(Same idea as for “.add(),” only multiplication.)


°°°1°°° / °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
(see above in various “+” operations – same idea)

Data type of output value:
(see above in various “+” operations)

Note about the output value:
(Same idea as for “+,” only division.)


°°°1°°°.divide(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
(see above in various “.add()” operations – same idea)

Data type of output value:
(see above in various “.add()” operations)

Note about the output value:
(Same idea as for “.add(),” only division.)


°°°1°°° % °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
(see above in various “+” operations – same idea)

Data type of output value:
(see above in various “+” operations)

Note about the output value:
(Same idea as for “+,” only modulo division.)


°°°1°°°.mod(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
(see above in various “.add()” operations – same idea)

Data type of output value:
(see above in various “.add()” operations)

Note about the output value:
(Same idea as for “.add(),” only modulo division.)


°°°1°°° + °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext or number. #2: plaintext. (Or the reverse.)

Data type of output value:
plaintext

Note about the output value:
The value of expression #2 concatenated to the end of the value of expression #1. (Normal text concatenation!)

Example code:
'a'+str(4)

Output of example code (if it were inside a print() statement):
a4


°°°1°°° + °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series” full of data that can be concatenated-to. #2: plaintext. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the “Series”-typed expression, only with the value of the “plaintext”-typed expression concatenated to it (at the end of the plaintext if the plaintext is after the “+”, at the beginning if the plaintext is before the “+”). Will abort your program with an error if values in your Pandas “Series” aren’t things that plaintext can be concatenated to.


°°°1°°° + °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series” full of data that can be concatenated-to. #2: Pandas “Series” full of plaintext; SAME LENGTH as the Series in expression #1. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in expression #1, only with the “corresponding value” from the appropriate equivalent position within expression #2 concatenated to the end of every value in the first “Series.” Will not act as concatenation if values in both “Series” can have “+” interpreted in some other way (such as if both are numbers) – at least one Series must have plaintext in it for the “+” to work as concatenation. Will abort your program with an error if values in your Pandas “Series” aren’t things that plaintext can be concatenated to.


°°°1°°°.join(°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext. #2: list or Pandas “Series” full of data that can be concatenated with plaintext.

Data type of output value:
plaintext

Note about the output value:
A single piece of plaintext, composed of all the values from the native Python “list” or the Pandas “Series” from expression #2, concatenated together, using the plaintext from expression #1 as a separator. (Just use an “empty string,” which is two single-quotes next to each other with nothing in between them, as the value of input expression #1 if you don’t want a visible separator between the pieces.)


°°°1°°°.str.cat(sep=°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series” full of data that can be concatenated with plaintext. #2: plaintext.

Data type of output value:
plaintext

Note about the output value:
Does not work with native Python lists as input – only Pandas “Series.” For those, completely equivalent, functionally, to °°°2°°°.join(°°°1°°°), although allegedly runs slower, so apparently no real reason to use it. Note that in this one, the concatenation separator is towards the end of the expression, whereas in °°°2°°°.join(°°°1°°°), it’s towards the beginning.


str(°°°)

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
number, etc.

Data type of output value:
plaintext

Note about the output value:
A plain-text representation of the expression within its parentheses. Useful when you want to concatenate plaintext to a number.


°°°1°°°.zfill(°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext. #2: number (specifically, integer).

Data type of output value:
plaintext

Note about the output value:
Expression #1, which must be a “number-ey” piece of plaintext, zero-padded to the left to the specified minimum number of digits specified in expression #2. A “number-ey” piece of plaintext could come from, for example, doing a “str(°°°)” operation. Combining these two operations is great for turning a number into a U.S. 5-digit zip code.

Example code:
str(234).zfill(5)

Output of example code (if it were inside a print() statement):
00234


°°°.round()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the input expression, only with each of its values rounded off. Will abort your program with an error if values in your Pandas “Series” are, say, text, instead of numbers.


°°°.isnull()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by True/False explaining whether that value was null.


°°°.notnull()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by True/False explaining whether that value was NOT null.


°°°.tolist()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series” or Numpy “NDArray”

Data type of output value:
list

Note about the output value:
The same values that were inside the Pandas “Series” or “Numpy NDArray” from the input expression, only now captured inside a native Python “list” data type. Useful for certain fancy operations that take a native Python “list” as an input value – for example, various types of “is in?” operations.


°°°.duplicated()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by True/False – “False” if the value was unique to the Series; “True” if it wasn’t unique (if it was “duplicated” anywhere else in the series).


°°°.unique()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy “NDArray”

Note about the output value:
The same values that were inside the Pandas “Series” from the input expression, DEDUPLICATED to just the unique values, only now captured inside a value that is of the “NDArray” data type (from the Python module “Numpy”). For many practical purposes (such as printing to your output console), a Numpy “NDArray” will work a lot like a native Python “list” for you, so don’t worry about it – think of it as basically a list.


°°°.sort_values(ascending=True)

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with the values rearranged ascending, however that makes sense for “data type” of the values within the “Series.” IMPORTANT NOTE: unless you run a “re-indexing” command after sorting, the values will sort-of kind-of remember their “original position number” within the Series, meaning unpredictable behavior things that deal with “corresponding values” between two expressions that are both Pandas “Series.” For example, with a series full of numbers nicknamed “ser,” “ser.sort_values() + ser” will produce the same output as “ser + ser” (a new Series with each number doubled), and that output series will still print to screen or export to CSV in the order of the original position numbers. However, “ser.equals(ser.sort_values(ascending=False))” expression will be False even though “ser.equals(ser)” is True. ANOTHER NOTE: Only works in newer versions of the “Pandas” extension to Python.


°°°.sort(inplace=False, ascending=True)

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
Equivalent to the Pandas “Series” operation “.sort_values(ascending=True)”, but only works in older versions of the “Pandas” extension to Python.


°°°.max()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy plaintext-equivalent, Numpy number-equivalent, etc.

Note about the output value:
The “maximum value” found within the Pandas “Series” from the input expression. Will be “0-dimensional” data (just a single number, piece of plaintext, etc). Note that while technically, the output “data type” is not of the native Python number, plaintext, etc. types, in most pieces of code, you can use such data as if it were. If you run into problems, try converting one of these Numpy number-equivalent or Numpy plaintext-equivalent “output values” into a native Python equivalent by using its “.item()” operation. But only if you’re getting unexpected results without doing so. Your program will error out if it doesn’t make sense to apply some form of the concept of “maximum value” to the values in the input “Series.”


°°°.min()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy plaintext-equivalent, Numpy number-equivalent, etc.

Note about the output value:
The “minimum value” found within the Pandas “Series” from the input expression. Will be “0-dimensional” data (just a single number, piece of plaintext, etc). Your program will error out if it doesn’t make sense to apply some form of the concept of “minimum value” to the values in the input “Series.”


°°°.mean()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy number-equivalent

Note about the output value:
The “mean value” found within the Pandas “Series” from the input expression. Will be “0-dimensional” data (just a single number, piece of plaintext, etc). Your program will error out if it doesn’t make sense to apply some form of the concept of “mean value” to the values in the input “Series.”


°°°.median()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy number-equivalent

Note about the output value:
The “median value” found within the Pandas “Series” from the input expression. Will be “0-dimensional” data (just a single number, piece of plaintext, etc). Your program will error out if it doesn’t make sense to apply some form of the concept of “median value” to the values in the input “Series.”


°°°.sum()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series”

Data type of output value:
Numpy number-equivalent

Note about the output value:
The sum of all values found within the Pandas “Series” from the input expression. Will be “0-dimensional” data (just a single number, piece of plaintext, etc). Your program will error out if it doesn’t make sense to apply some form of the concept of “sum” to the values in the input “Series.”


°°°.item()

Number of input values:
1

Allowable input value “data types:”
Numpy plaintext-equivalent, Numpy number-equivalent, etc.

Data type of output value:
plaintext, number, etc.

Note about the output value:
The same value as in the input parameter, only transformed under the covers into a “native Python data type.” You shouldn’t need this (“equivalents” for 0-dimensional / single-point data in the “Numpy” extension to Python are pretty good about working wherever in the code you could put the type of data that they’re equivalent to), but now you have it in case you do.


°°°1°°° == °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether input expression #2 is the same value as input expression #1. You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. One particular “gotcha” can be plaintext-to-number comparisons (is a text representation of 500 really the same as the number 500?). Also watch case-sensitivity – for case-insensitive “equals” checks between pieces of plaintext, you’ll want to uppercase/lowercase both sides of the “equals” first! (This is true in all “comparison” operations between pieces of text mentioned here.)


°°°1°°° == °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in input expression #1, only with each value in the “Series” replaced by True/False explaining whether that value was the same as the value from input expression #2. (Or equivalent, reading explanation w/ #1 & #2 swapped if swapped order in code.) You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. One particular “gotcha” can be plaintext-to-number comparisons (is a text representation of 500 really the same as the number 500?).


°°°1°°°.equals(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: Pandas “Series;” SAME LENGTH as the Series in expression #1.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether each and every value in Pandas “Series” #2 is the same as the value it “corresponds to” by position number in Pandas “Series” #1. (Good for, say, a quick dummy-check of whether anyone made any typos between 2 columns that should be the same.) You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. One particular “gotcha” can be plaintext-to-number comparisons (is a text representation of 500 really the same as the number 500?).


°°°1°°° != °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether input expression #2 is the a different value from input expression #1. You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. One particular “gotcha” can be plaintext-to-number comparisons (is a text representation of 500 really the same as the number 500?).


°°°1°°° != °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in input expression #1, only with each value in the “Series” replaced by True/False explaining whether that value was a different value from input expression #2. (Or equivalent, reading explanation w/ #1 & #2 swapped if swapped order in code.) You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. One particular “gotcha” can be plaintext-to-number comparisons (is a text representation of 500 really the same as the number 500?).


°°°1°°° < °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether input expression #2 is “less than” input expression #1. You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. For example, “True” is “greater than” false and “equal to” 1,” and you’ll want to be careful when doing things like “greater than” on text.


°°°1°°° < °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in input expression #1, only with each value in the “Series” replaced by True/False explaining whether that value was “less than” the value from input expression #2. (Or equivalent, reading explanation w/ #1 & #2 swapped if swapped order in code.) You may get unexpected answers for True/False, or your program may halt with an error, if the value-comparisons don’t intuitively make sense. For example, “True” is “greater than” false and “equal to” 1,” and you’ll want to be careful when doing things like “greater than” on text.


°°°1°°° > °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
You get the point


°°°1°°° > °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
You get the point


°°°1°°° <= °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
You get the point


°°°1°°° <= °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
You get the point


°°°1°°° >= °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext, number, Boolean (True/False), etc. #2: plaintext, number, Boolean (True/False), etc.

Data type of output value:
Boolean (True/False)

Note about the output value:
You get the point


°°°1°°° >= °°°2°°°

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext, number, Boolean (True/False), etc. (Or the reverse.)

Data type of output value:
Pandas “Series”

Note about the output value:
You get the point


°°°1°°° in °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext or number. #2: list.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether input expression #1 exists anywhere among the values in the (native Python) list used for input expression #2. Note that you can’t pass a Pandas “Series” as expression #2, but if you first convert it into a native Python list by appending “.tolist()” to a native Pandas “Series,” you can pass THAT as expression #2.


°°°1°°° not in °°°2°°°

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: plaintext or number. #2: list.

Data type of output value:
Boolean (True/False)

Note about the output value:
Whether input expression #1 DOESN’T exist at all among the values in the (native Python) list used for input expression #2. Note that you can’t pass a Pandas “Series” as expression #2, but if you first convert it into a native Python list by appending “.tolist()” to a native Pandas “Series,” you can pass THAT as expression #2.


list(x for x in °°°1°°° if x not in °°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: list. #2: list.

Data type of output value:
list

Note about the output value:
Produces a copy of the list from input expression #1, minus any of its values that happened to be values that were present in the list from input expression #2. (Can be really handy for runinng against a list of CSV column-names as input #1, when you’d like to keep/remove columns in a Pandas DataFrame, but the list of things you want to keep/remove is best described as “All columns except X, Y, & Z.”) Note that the “x” in the example code could be variable-name you choose, as long as you’re not using it elsewhere in your code and as long as you keep it consistent all 3 times that it appears in this operation. The choice of “x” isn’t special – it could be “list(item for item in °°°1°°° if item not in °°°2°°°)” instead, for example. Note that you won’t be able to refer to “x” or “item” or whatever you choose again after this expression – it has a “scope,” or “lifetime,” of just between the parentheses of “list()”.


°°°.str.lower()

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the input expression, only with each of its values lower-cased. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.


°°°.str.upper()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the input expression, only with each of its values upper-cased. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.


°°°.str.len()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the input expression, only with each of its values replaced by an integer indicating how long the value is. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.


°°°.str.isnumeric()

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by True/False explaining whether that value, when looked at as plaintext, could be interpreted by the naked eye to be “a number.”


°°°1°°°.str.contains(°°°2°°°)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext.

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by True/False explaining whether that value “contains” input expression #2. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.


°°°1°°°.str.replace(°°°2°°°,°°°3°°°)

Number of input values:
3

Allowable input value “data types:”
#1: Pandas “Series.” #2: plaintext. #3: plaintext.

Data type of output value:
Pandas “Series”

Note about the output value:
The same-sized Pandas “Series” as in the input expression, only with each value in the “Series” replaced by a copy of itself where any substring matching input expression #2 has been replaced by input expression #3. Remember that to simply remove input expression #2, you can make input expression #3 an “empty string” (two single-quotes back to back with nothing between them).


°°°.astype(str)

(official documentation link)

Number of input values:
1

Allowable input value “data types:”
Pandas “Series.”

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in the input expression, only with each of its values replaced by a plaintext version of the value. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.


°°°1°°°.str.zfill(°°°2°°°)

(official documentation link)

Number of input values:
2

Allowable input value “data types:”
#1: Pandas “Series.” #2: number (specifically, integer).

Data type of output value:
Pandas “Series”

Note about the output value:
The same Pandas “Series” as in input expression #1, only with each of its purely-numeric-looking plaintext values replaced by a plaintext version of the value with zeroes padded out to the left until the value is as long as specified by input expression #2. Good for zip codes padding, only it doesn’t work when the values within the “Series” are numbers, so be sure to run .astype(str) against the Series first. Will abort your program with an error if values in your Pandas “Series” aren’t of a “data type” that it makes sense to do this to.