6. First level GLM
Contents
6. First level GLM#
In this chapter, we will go through a simple workflow of the first level general linear modeling with a BIDS dataset from openneuro. This analysis is only performed on one subject.
This tutorial is based on the Nilearn GLM tutorial.
import nest_asyncio
nest_asyncio.apply()
Preparation#
Import packages that will be used globally and set up output directory
import os
import pydra
from pydra import Workflow
from pydra.engine.specs import File
import typing as ty
from pathlib import Path
# get current directory
pydra_tutorial_dir = os.path.dirname(os.getcwd())
# set up output directory
workflow_dir = Path(pydra_tutorial_dir) / "outputs"
workflow_out_dir = workflow_dir / "6_glm"
# create the output directory if not exit
os.makedirs(workflow_out_dir, exist_ok = True)
Create tasks#
In this section, we converte major steps into tasks. Each pydra task can have multiple python functions. We recommand to put those logically more related functions into the same task.
It is very important to keep in mind what adjacent tasks of your current task will be.
Your previous task will decide your arguments in the current task
Your next task will be impacted by the returns in the current task
fetch openneuro BIDS dataset#
In this task, we do the following:
get openneuro dataset index
specify exclusion patterns and number of subjects
download the data we need
Notes: Here we still use n_subjects
as an argument. Given that we will only analyze one subject, you can also remove this argument and specify n_subjects =1
in select_from_index
. If you do, do not forget to modify the argument in the workflow later.
@pydra.mark.task
@pydra.mark.annotate({"exclusion_patterns": list, "n_subjects":int, "return": {"data_dir":str}})
def get_openneuro_dataset(exclusion_patterns, n_subjects):
from nilearn.datasets import (fetch_openneuro_dataset_index,
fetch_openneuro_dataset, select_from_index)
_, urls = fetch_openneuro_dataset_index()
urls = select_from_index(
urls, exclusion_filters=exclusion_patterns, n_subjects = n_subjects)
data_dir, _ = fetch_openneuro_dataset(urls=urls)
return data_dir
obtain FirstLevelModel objects automatically and fit arguments#
To get the first level model(s) we have to specify
the dataset directory
the task_label
the space_label
the folder with the desired derivatives (fMRIPrep)
In our case, we only have one subject so we will only have one first level model. Then, for this model, we will obtain
the list of run images
events
confound regressors
Those are inferred from the confounds.tsv files available in the BIDS dataset.
@pydra.mark.task
@pydra.mark.annotate({"data_dir": str, "task_label": str, "space_label": str,"derivatives_folder": str, "smoothing_fwhm": float,
"return": {"model": ty.Any, "imgs": list, "subject": str}})
def get_info_from_bids(
data_dir,
task_label,
space_label,
smoothing_fwhm,
derivatives_folder
):
from nilearn.glm.first_level import first_level_from_bids
models, models_run_imgs, models_events, models_confounds = \
first_level_from_bids(dataset_path = data_dir, task_label = task_label, space_label = space_label,
smoothing_fwhm = smoothing_fwhm, derivatives_folder = derivatives_folder)
model, imgs, events, confounds = (
models[0], models_run_imgs[0], models_events[0], models_confounds[0])
subject = "sub-" + model.subject_label
return model, imgs, subject
Get design matrix#
This task does the following:
read the design matrix in
.mat
rename the column
save the new design matrix as
.csv
Think: What if we don’t save the new design matrix, but return
it directly? In other words, we return
a pandas.DataFrame
instead of a path
. What will happen? Worth a try :)
@pydra.mark.task
@pydra.mark.annotate({"data_dir":str, "subject":str, "return": {"dm_path": str}})
def get_designmatrix(data_dir, subject):
from nilearn.interfaces.fsl import get_design_from_fslmat
fsl_design_matrix_path = os.path.join(
data_dir, 'derivatives', 'task', subject, 'stopsignal.feat', 'design.mat')
design_matrix = get_design_from_fslmat(fsl_design_matrix_path, column_names=None)
design_columns = ['cond_%02d' % i for i in range(len(design_matrix.columns))]
design_columns[0] = 'Go'
design_columns[4] = 'StopSuccess'
design_matrix.columns = design_columns
dm_path = os.path.join(workflow_out_dir, "designmatrix.csv")
design_matrix.to_csv(dm_path,index=None)
return dm_path
Fit 1st level model#
What we are doing here is:
use the design matrix to fit the first level model
compute the contrast
save the z_map and masker for futher use
@pydra.mark.task
@pydra.mark.annotate({"model": ty.Any,"imgs": ty.Any,
"dm_path": ty.Any,"contrast": str,
"return": {"model": ty.Any, "z_map_path":str, "masker":ty.Any}})
def model_fit(
model,
imgs,
dm_path,
contrast
):
import pandas as pd
design_matrix = pd.read_csv(dm_path)
model.fit(imgs, design_matrices = [design_matrix])
z_map = model.compute_contrast(contrast)
z_map_path = os.path.join(workflow_out_dir, "firstlevel_z_map.nii.gz")
z_map.to_filename(z_map_path)
masker_path = os.path.join(workflow_out_dir, "firstlevel_masker.nii.gz")
masker = model.masker_
return model, z_map_path, masker
Get cluster table and glm report#
For publication purposes, we obtain a cluster table and a summary report.
@pydra.mark.task
@pydra.mark.annotate({"z_map_path":str,
"return":{"output_file":str}})
def cluster_table(z_map_path):
import nibabel as nib
from nilearn.reporting import get_clusters_table
from scipy.stats import norm
stat_img = nib.load(z_map_path)
output_file = os.path.join(workflow_out_dir, "cluster_table.csv")
df = get_clusters_table(stat_img,
stat_threshold = norm.isf(0.001),
cluster_threshold=10)
df.to_csv(output_file, index=None)
return output_file
# get glm report
@pydra.mark.task
@pydra.mark.annotate({"model":ty.Any, "contrasts":str,
"return":{"output_file":str}})
def glm_report(
model,
contrasts
):
from nilearn.reporting import make_glm_report
output_file = os.path.join(workflow_out_dir, "glm_report.html")
report = make_glm_report(model, contrasts)
report.save_as_html(output_file)
return output_file
Make plots#
Here we want to make some plots to display our results and compare the result from FSL.
plot nilearn z-map
plot fsl z-map
plot nilearn and fsl comparison
plot design matrix contrast
You can also seperate this task into multiple sub-tasks. But it makes more sense to put them into one task as they use the same files and function nilearn.plotting
repeatedly.
@pydra.mark.task
@pydra.mark.annotate({"data_dir":str, "dm_path":str, "z_map_path":str,
"contrast":str,"subject":str, "masker":ty.Any,
"return":{"output_file1":str, "output_file2":str,
"output_file3":str, "output_file4":str}})
def plots(
data_dir,
dm_path,
z_map_path,
contrast,
subject,
masker,
):
import pandas as pd
import nibabel as nib
from nilearn.plotting import plot_glass_brain, plot_img_comparison, plot_contrast_matrix
import matplotlib.pyplot as plt
from scipy.stats import norm
# plot and save nilearn z-map
z_map = nib.load(z_map_path)
output_file1 = os.path.join(workflow_out_dir, "nilearn_z_map.png")
plot_glass_brain(z_map, output_file = output_file1, colorbar = True,
threshold = norm.isf(0.001), title = 'Nilearn Z map of "StopSuccess - Go" (unc p<0.001)',
plot_abs = False, display_mode = 'ortho')
# plot and save fsl z-map
fsl_z_map = nib.load(
os.path.join(data_dir, 'derivatives', 'task', subject, 'stopsignal.feat',
'stats', 'zstat12.nii.gz'))
output_file2 = os.path.join(workflow_out_dir, "fsl_z_map.png")
plot_glass_brain(fsl_z_map, output_file = output_file2, colorbar = True,
threshold = norm.isf(0.001), title = 'FSL Z map of "StopSuccess - Go" (unc p<0.001)',
plot_abs = False, display_mode = 'ortho')
# plot and save nilearn and fsl comparison
plot_img_comparison([z_map], [fsl_z_map], masker, output_dir = workflow_out_dir,
ref_label = 'Nilearn', src_label = 'FSL')
old = os.path.join(workflow_out_dir, "0000.png")
new = os.path.join(workflow_out_dir, "nilearn_fsl_comp.png")
output_file3 = os.rename(old,new)
# plot and save design matrix contrast
design_matrix = pd.read_csv(dm_path)
output_file4 = os.path.join(workflow_out_dir, "firstlevel_contrast.png")
plot_contrast_matrix(contrast, design_matrix, output_file = output_file4)
return output_file1, output_file2, output_file3, output_file4
Make a workflow from tasks#
Now we have created all tasks we need for this first level analysis, and there are two choices for our next step.
create one workflow to connect all tasks together
create sub-workflows with some closely related tasks, and connect these workflows along with other tasks into a larger workflow.
We recommand the second approach as it is alway a good practice to group tasks, especially when there are a large number of tasks in the analysis.
Our analysis can be divided into three parts: (1) get/read the data, (2) analyze the data, and (3) plot the result, where (1) and (3) only have one task each. So we can put all tasks in (2) into one workflow and name it as firstlevel
or whatever you prefer.
# initiate a workflow
wf_firstlevel = Workflow(name="wf_firstlevel", input_spec=["data_dir",
"task_label",
"space_label",
"derivatives_folder",
"smoothing_fwhm",
"contrast",
"output_dir"]
)
# specify input
wf_firstlevel.inputs.task_label = 'stopsignal'
wf_firstlevel.inputs.space_label = 'MNI152NLin2009cAsym'
wf_firstlevel.inputs.derivatives_folder = 'derivatives/fmriprep'
wf_firstlevel.inputs.smoothing_fwhm = 5.0
# add task - get_info_from_bids
wf_firstlevel.add(get_info_from_bids(name="get_info_from_bids",
data_dir = wf_firstlevel.lzin.data_dir,
task_label = wf_firstlevel.lzin.task_label,
space_label = wf_firstlevel.lzin.space_label,
derivatives_folder = wf_firstlevel.lzin.derivatives_folder,
smoothing_fwhm = wf_firstlevel.lzin.smoothing_fwhm
)
)
# add task - get_designmatrix
wf_firstlevel.add(get_designmatrix(name = "get_designmatrix",
data_dir = wf_firstlevel.lzin.data_dir,
subject = wf_firstlevel.get_info_from_bids.lzout.subject,
)
)
wf_firstlevel.add(model_fit(name = "l1estimation",
model = wf_firstlevel.get_info_from_bids.lzout.model,
imgs = wf_firstlevel.get_info_from_bids.lzout.imgs,
dm_path = wf_firstlevel.get_designmatrix.lzout.dm_path,
contrast = wf_firstlevel.lzin.contrast
)
)
# add task - cluster_table
wf_firstlevel.add(cluster_table(name = "cluster_table",
z_map_path = wf_firstlevel.l1estimation.lzout.z_map_path))
# add task - glm_report
wf_firstlevel.add(glm_report(name = "glm_report",
model = wf_firstlevel.l1estimation.lzout.model,
contrasts = wf_firstlevel.lzin.contrast
)
)
# specify output
wf_firstlevel.set_output([
("z_map", wf_firstlevel.l1estimation.lzout.z_map_path),
("masker", wf_firstlevel.l1estimation.lzout.masker),
("subject", wf_firstlevel.get_info_from_bids.lzout.subject),
("dm_path", wf_firstlevel.get_designmatrix.lzout.dm_path),
("cluster_table", wf_firstlevel.cluster_table.lzout.output_file),
("glm_report", wf_firstlevel.glm_report.lzout.output_file)
])
The overaching workflow#
Connect other tasks and the above workflow into one
Now we need to create the overaching glm workflow that connects the above workflow and other tasks (e.g., get/read the data
and plot the result
)
wf = Workflow(name = "firstlevel_glm",
input_spec = ["exclusion_patterns","n_subjects","contrast","output_dir"],
)
wf.inputs.exclusion_patterns = ['*group*', '*phenotype*', '*mriqc*',
'*parameter_plots*', '*physio_plots*',
'*space-fsaverage*', '*space-T1w*',
'*dwi*', '*beh*', '*task-bart*',
'*task-rest*', '*task-scap*', '*task-task*']
wf.inputs.n_subjects = 1
wf.inputs.output_dir = workflow_out_dir
wf.inputs.contrast = 'StopSuccess - Go'
wf.add(get_openneuro_dataset(name = "get_openneuro_dataset",
exclusion_patterns = wf.lzin.exclusion_patterns,
n_subjects = wf.lzin.n_subjects
)
)
wf_firstlevel.inputs.data_dir = wf.get_openneuro_dataset.lzout.data_dir
wf_firstlevel.inputs.contrast = wf.inputs.contrast
wf_firstlevel.inputs.output_dir = wf.inputs.output_dir
wf.add(wf_firstlevel)
wf.add(plots(name = "plots",
data_dir = wf.get_openneuro_dataset.lzout.data_dir,
dm_path = wf_firstlevel.lzout.dm_path,
z_map_path = wf_firstlevel.lzout.z_map,
contrast = wf.lzin.contrast,
subject = wf_firstlevel.lzout.subject,
masker = wf_firstlevel.lzout.masker
)
)
wf.set_output([
("output1", wf.plots.lzout.output_file1),
("output2", wf.plots.lzout.output_file2),
("output3", wf.plots.lzout.output_file3),
("output4", wf.plots.lzout.output_file4)
])
Run Workflow Run#
from pydra import Submitter
with Submitter(plugin="cf", n_procs=4) as submitter:
submitter(wf)
results = wf.result()
print(results)
Visualization#
If you arrive here without any errors, yay, you just made your first pydra workflow for a first-level GLM!
Examine folder structure#
Let’s take a look at what you have got.
!ls ../outputs/6_glm
Click to see what you should get
cluster_table.csv
firstlevel_z_map.nii.gz
nilearn_fsl_comp.png
designmatrix.csv
fsl_z_map.png
nilearn_z_map.png
firstlevel_contrast.png glm_report.html
Plot figures#
First level contrast#
from IPython.display import Image
Image(filename='../outputs/6_glm/firstlevel_contrast.png')
Nilearn Z map#
Image(filename='../outputs/6_glm/nilearn_z_map.png')
FSL Z map#
Image(filename='../outputs/6_glm/fsl_z_map.png')
Nilearn FSL comparison#
Image(filename='../outputs/6_glm/nilearn_fsl_comp.png')
Exercise#
What if we need to run the first-level GLM on multiple subject? We will need the splitter
.
So, where should we add .split
?