Learning Python for Research (Part II): Data Structures & Control Flow

Python Programming Research

If you followed along with Part I, you should have a working conda environment, a text editor, and some basic Python running. Now comes the part that actually starts to feel useful.

This post covers data structures, control flow, and functions. Not glamorous, but this is the stuff that makes up the bulk of any real script you'll ever write. Before we bring in external libraries, I want to show you how much you can already do with just what Python ships with, and more importantly, NumPy and Pandas build on top of this, not around it.

With just the tools covered here, you can already:

Store participant IDs, trial results, and metadata
Loop through hundreds of trials without manual repetition
Flag bad trials automatically based on criteria
Write helper functions so you never repeat the same chunk of code by hand

Lists: Ordered Collections

Lists are one of the most common ways to store multiple items in Python. They're ordered, which means items keep their position, and they're flexible: you can add, remove, and change them at will. In research, you might keep a list of participant IDs, stimulus names, or data files to process.

participants = ["P001", "P002", "P003"]

# access by index (count starts at 0)
print(participants[0])   # P001
print(participants[-1])  # P003 (last item)

# add a new participant
participants.append("P004")

# remove a participant
participants.remove("P002")

# loop over them
for pid in participants:
    print(pid)

Lists aren't just for strings; they can hold numbers, booleans, or even other lists:

# list of trial accuracies
accuracies = [0.91, 0.85, 0.88]

# list of lists - participant ID and their accuracies
data = [
    ["P001", [0.91, 0.85, 0.88]],
    ["P002", [0.75, 0.80, 0.78]]
]

Try it yourself:

Create a list of your top 5 favourite research tools or software.
Add a new item to the list, then remove one.
Print each item with its position number (hint: use enumerate()).

Tuples: Lists That Can't Change

Tuples are like lists, but immutable. Once you make one, it can't be changed. They’re great for storing fixed information, like coordinates, or function return values that shouldn’t be modified by mistake.

origin = (0, 0)  # x, y coordinates
print(origin[0])  # 0

# tuple unpacking
x, y = origin
print(f"x: {x}, y: {y}")

Tuples can also be used when returning multiple values from a function:

def min_and_max(values):
    return (min(values), max(values))

result = min_and_max([3, 7, 1, 5])
print(result)       # (1, 7)
low, high = result
print(low, high)    # 1 7

Try it yourself:

Create a tuple that stores today’s date as (year, month, day).
Use tuple unpacking to print it in “day/month/year” format.

Dictionaries: Key/Value Mappings

Dictionaries store data as key/value pairs. Think of them like labeled boxes: the “key” is the label, and the “value” is what’s inside. They’re perfect for mapping IDs to results, condition names to settings, or filenames to their metadata.

results = {
    "P001": [0.52, 0.48, 0.56],
    "P002": [0.61, 0.59, 0.63],
}
print(results["P001"])  # [0.52, 0.48, 0.56]

# add a new participant
results["P003"] = [0.70, 0.65, 0.72]

# loop through dictionary
for pid, scores in results.items():
    print(pid, scores)

You can also check if a key exists before using it:

if "P004" not in results:
    results["P004"] = [0.80, 0.82, 0.78]

Try it yourself:

Create a dictionary mapping 3 country names to their capital cities.
Add a new country/capital pair.
Loop through the dictionary and print “The capital of X is Y”.

Sets: Unique Values Only

Sets store unordered, unique items. If you need to remove duplicates from a dataset or just check whether something is present, sets can save you a lot of time.

conditions = {"A", "B", "C", "A"}
print(conditions)  # {'A', 'B', 'C'} — duplicates removed

# membership test
print("A" in conditions)  # True
print("D" in conditions)  # False

Converting a list to a set is a quick way to deduplicate it:

participants = ["P001", "P002", "P002", "P003"]
unique_participants = set(participants)
print(unique_participants)  # {'P001', 'P002', 'P003'}

Try it yourself:

Create a list with repeated numbers.
Convert it to a set to remove duplicates.
Check if a specific number is in the set.

Control Flow: Making Your Code Think

If / elif / else

Use these to branch your code based on conditions. In research, this might mean adjusting difficulty based on performance or flagging poor-quality data.

accuracy = 0.82

if accuracy > 0.9:
    print("Excellent performance!")
elif accuracy > 0.75:
    print("Good performance!")
else:
    print("Needs improvement.")

Loops

Loops let you repeat actions without writing the same code over and over.

# for loop - runs a set number of times
for trial in range(1, 6):
    print(f"Trial {trial}")

# while loop - runs until a condition is false
countdown = 3
while countdown > 0:
    print(countdown)
    countdown -= 1

Try it yourself:

Write an if/elif/else statement that prints different feedback based on a variable called score.
Write a for loop that prints all even numbers from 2 to 20.
Write a while loop that counts down from 5 to 1.

Functions: Reusable Code Blocks

Functions package your code into named, reusable pieces. This keeps your scripts tidy and makes it easier to test and debug individual parts.

def mean(values):
    return sum(values) / len(values)

rt_list = [350, 420, 390]
print(mean(rt_list))  # 386.666...

You can also add default parameters:

def greet(name, greeting="Hello"):
    print(f"{greeting}, {name}!")

greet("Gregg")           # Hello, Gregg!
greet("Gregg", "Hi")     # Hi, Gregg!

Try it yourself:

Write a function that returns the square of a number.
Write a function that takes a list of numbers and returns the largest one.
Write a function with a default parameter, then call it with and without providing that parameter.

Mini-Project: Participant Performance Tracker

Let’s tie everything together with a small example that doesn’t require any extra files.

def mean(values):
    return sum(values) / len(values)

# participant accuracy data (each list is a set of trial accuracies)
results = {
    "P001": [0.9, 0.8, 0.85],
    "P002": [0.7, 0.75, 0.72],
    "P003": [0.95, 0.92, 0.94]
}

# loop through each participant and print their mean accuracy
for pid, accs in results.items():
    avg_acc = mean(accs)
    if avg_acc > 0.9:
        performance = "Excellent"
    elif avg_acc > 0.75:
        performance = "Good"
    else:
        performance = "Needs improvement"
    print(f"{pid}: {avg_acc:.2f} - {performance}")

Try it yourself:

Add another participant with their own accuracy scores.
Write a new function that returns the highest single accuracy for a participant.
Sort participants by their average accuracy and print them from highest to lowest.

That's Part II. If you've gone through the "try it yourself" sections, you already know enough Python to automate a bunch of tedious stuff in your research. Part III brings in NumPy, Pandas, and Matplotlib, and that's when things start to feel genuinely powerful.