Big Data in Biology Summer School
Intensive four-day workshops on diverse topics for analysis of large-scale DNA, RNA, and protein datasets
The Center for Biomedical Research Support hosts the Annual Summer School for Big Data in Biology each May and June. Participants gain hands-on experience with real datasets and tools, guided by experts in computational biology. Courses are tailored for beginners through advanced users.
Course Format
- Runs for 4–5 consecutive days (mornings or afternoons).
- Offered online or in-person.
- Includes lectures, datasets, and practical exercises.
- No exams — certificates of completion available on request.
- Academic credit is not issued.
Fees and Registration
We accept personal credit cards (AmEx, MasterCard, Visa, Discover), UT ProCards (see details), and IDT (interdepartmental transfer).
- Groups of 5 or more from the same agency or institution receive a 20% discount.
- Register for 3 or more courses and receive a 50% discount
| Affiliation | Students / Post-docs | Faculty / Staff |
|---|---|---|
| UT System | $195* | $295* |
| Non-UT Schools | $275** | $500** |
| Other |
| |
* Our staff will confirm affiliations with UT.
** Non-UT students must provide a copy of their current student or faculty/staff ID.
Refund and Cancellation Policy
Refunds (minus a $25 fee) are available if requested in writing at least one week before the course start. No refunds or substitutions will be granted after that date. Failure to cancel on time and non-attendance still require full payment. UT Austin may cancel courses with low enrollment and will issue full refunds in those cases.
For issues or questions about registration, email bcg@utexas.edu.
Important Registration Notice
Incognito (private) mode, clearing web browser cache, or switching browsers might be necessary to complete course registration if the cart remains empty.
Do NOT use someone else’s PIN number during the registration process, or your registration will not be complete. Use your own unique PIN number assigned to you during registration if you are new, or the same PIN number you have used for earlier registrations.
Also, if you are registering on behalf of someone else, PLEASE DO NOT use your name, contact information, or EID at any point in the process. You MUST use the information as it pertains to the student, or they will not be included on the course roster properly and could miss out on crucial course communication. Ask that the student you are registering email you the receipt when they receive it via their email.
No refunds will be issued within 2 business days of the course start date.
Summer 2026 Courses
- May 26 - May 29 Introduction to Statistical Modeling
- May 26 - May 29 Introduction to RNA-Seq
- June 1 - June 5 Introduction to Biocomputing: Working in Unix and R
- June 1 - June 5 Introduction to Python
- June 8 - June 12 Introduction to Core NGS Concepts and Tools
- June 15 - June 18 Principles of Machine Learning for Bioinformatics
Introduction to Statistical Modeling
Layla Guyot
This course is a hands-on introduction to building and interpreting statistical models in R, with a focus on real-world applications. We will cover key concepts in hypothesis testing, multiple linear regression, and logistic regression. You will learn how to choose appropriate modeling approaches, fit models using R, check assumptions, interpret results, and clearly communicate your findings. Each topic will include a brief introduction to foundational concepts, a demonstration of analysis in R, and guided practice through interactive coding exercises. Emphasis will be placed on using statistical modeling to answer research questions within reproducible workflows. By the end of the course, the goal is for you to be able to apply statistical modeling to your own data.
Preferred or Prerequisite Skills:
This course is recommended for students with some prior knowledge of R or programming in general.
Computer Requirement:
Participants are expected to provide their own laptops.
If using a UT Procard, read this disclaimer.
Back to topIntroduction to RNA-Seq
Dhivya Arasappan (Co-Director, Bioinformatics Consulting Group, CBRS)
This four-day course provides an introduction to methods for analysis of RNA-seq data. A typical RNA-seq workflow will be featured, starting from quality assessment of raw data, mapping (bwa, kallisto), differential expression analysis (DESeq2), and downstream analyses and visualization. The course also describes analysis methods for dealing with single-cell RNA-Seq data. Participants will gain hands-on experience using these tools in a Linux command line environment.
Preferred or Prerequisite Skills:
None
Computer Requirement:
Students should have their own laptop computer. UT EID is required for wireless access on campus. Please be sure you know both your UT EID when you come to class. To obtain a UT EID, go here.
If using a UT Procard, read this disclaimer.
Back to topIntroduction to Biocomputing: Working in Unix and R
Matt Bramble (Bioinformatician, Bioinformatics Consulting Group, CBRS)
This course will cover the Unix command line and data analysis in R within the context of biocomputing. We will start at the Unix command line and cover command line tools for manipulating data files, before transitioning to RStudio to cover introductory topics and engage with data analysis methods in R. The course will finish up with tidyverse tools and methods for visualizing data using ggplot2.
Preferred or Prerequisite Skills:
None
Computer Requirement:
Students should have their own laptop computer. A UT EID is required for wireless access on campus. Please be sure you know both your UT EID when you come to class. To obtain a UT EID, go here.
If using a UT Procard, read this disclaimer.
Back to topIntroduction to Python
James Derry (Senior Systems Administrator)
This five-day course will introduce students to basic concepts in programming using the Python language, establishing a foundation for scientific computing. Trainees will learn introductory topics such as data structures, control flow, functions, file input/output, and data parsing. The class will work with SciPy libraries like Pandas. Trainees will have full access to the teacher’s course book and course content (datasets, scripts, and jupyter notebooks).
Preferred or Prerequisite Skills:
None
Computer Requirement:
This class is offered in-person. Students must provide laptops able to connect to the internet, and a Firefox or Chrome browser. UT EID is required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UT EID, go here.
If using a UT Procard, read this disclaimer.
Back to topIntroduction to Core NGS Concepts and Tools
Anna Battenhouse (Associate Research Scientist and Bioinformatics Consultant, CBRS)
This five-day course provides an introduction to the concepts and vocabulary of Next Generation Sequencing (NGS) with an emphasis on common protocols, tools and file formats used in NGS data analysis. Subjects covered include quality assessment and manipulation of raw NGS sequences (FastQC, cutadapt), read mapping (bwa, bowtie2), the Sequence Alignment Map (SAM) format, and tools for manipulating BAM files (samtools, bedtools). Participants will gain hands-on experience using these and other NGS tools in the Linux command line environment at TACC, as well as exposure to the many bioinformatics resources TACC makes available.
Preferred or Prerequisite Skills:
None. UNIX/Linux command line experience is not required, and becoming familiar with how to use the command line for NGS analysis will be a major focus of this course. However, to get a head start on developing this important skill you can look through our Intro to Unix/Linux workshop wiki, and our Intermediate Unix/Linux workshop wiki.
Computer Requirement:
In order to participate fully in the hands-on exercises students should have their own laptop computer with an SSH client program. Macs have SSH available in the Terminal application. Recent Windows versions have an SSH client built into its PowerShell and Command Prompt programs, or PuTTy can be used if SSH is not available. A TACC Account and UT EID are also required. To obtain a UT EID, go here. To sign up for a TACC account, go here.
If using a UT Procard, read this disclaimer.
Back to topPrinciples of Machine Learning for Bioinformatics
Dennis Wylie (Co-Director, Bioinformatics Consulting Group, CBRS)
This four-day course will introduce a selection of machine learning methods used in bioinformatic analyses with a focus on RNA-seq gene expression data. We will cover unsupervised learning, dimensionality reduction and clustering; feature selection and extraction; and supervised learning methods for classification (e.g., random forests, SVM, LDA, kNN, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants will have the opportunity to apply these methods as implemented in R and python to publicly available data.
Preferred or Prerequisite Skills:
This course is recommended for students with some prior knowledge of either R or python. Participants are expected to provide their own laptops with recent versions of R and/or python installed. Students will be instructed to download several free software packages (including R packages and python libraries including pandas and sklearn).
Computer Requirement:
Students should have their own laptop computer. UT EID is required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UT EID, go here.
$50
If using a UT Procard, read this disclaimer.
Back to top