October 16, 2025 – SORA / TABA / ASA Chapter

TABA Seminar and Networking Event
Location: 209 Victoria St, Toronto, ON
Time: 18:00 PM EST, light dinner and refreshments provided
Cost: Attendance at the seminar is free. If you would like to stay for dinner, we kindly request a $10 donation for working professionals, free for students!

Agenda:

Presenter: Dr. Christopher Meaney, Biostatistician, University of Toronto
Title: Unstructured Clinical Text for Patient Phenotyping: Current Methods and Future Directions
Abstract: Patient phenotyping from electronic health records (EHRs) is a critical task in clinical research and health system monitoring, particularly when relying on unstructured clinical text. Early work used rule-based algorithms, often with regular expressions (REGEX), to identify phenotype-defining terms while accounting for contextual nuances (e.g. negation, uncertainty, attribution to another experiencer, etc.). Unsupervised methods, including clustering and topic modeling, have been applied to discover latent phenotypes without labeled outcome data. When labeled outcome data are available, supervised machine learning approaches – from traditional machine learning classifiers to deep neural networks – have been widely adopted (possibly in hybrid systems including rule-based approaches). More recently, encoder-based large language model (LLM) architectures, especially transformers, have enabled robust representation learning for clinical NLP tasks enabling digital phenotyping (e.g. document classification, named entity recognition, etc.). Decoder-based foundation models, combined with prompt engineering, are emerging as chat-assistive tools for digital phenotyping, including generative rule-set construction, supervised classification, and topic extraction. This talk will review current and emerging approaches to clinical NLP for patient phenotyping, highlighting methodological trade-offs, practical applications, and directions for integrating rule-based, machine learning, and foundation model techniques in future research.

Presenter: Dr. Menglu Che, Senior Statistician II, Astrazeneca
Title: Improving estimation efficiency for two-phase, outcome-dependent sampling studies
Abstract: Two-phase outcome dependent sampling (ODS) is widely used in many fields, especially when certain covariates are expensive and/or difficult to measure. For two-phase ODS, the conditional maximum likelihood (CML) method is very attractive because it can handle zero Phase 2 selection probabilities and avoids modeling the covariate distribution. However, most existing CML-based methods use only the Phase 2 sample and thus may be less efficient than other methods. We propose a general empirical likelihood method that uses CML augmented with additional information in the whole Phase 1 sample to improve estimation efficiency. The proposed method maintains the ability to handle zero selection probabilities and avoids modeling the covariate distribution, but can lead to substantial efficiency gains over CML in the inexpensive covariates, or in the influential covariate when a surrogate is available, because of an effective use of the Phase 1 data. Simulations and a real data illustration using NHANES data are presented.

Presenter: Eric Cai, Senior Data Scientist at Acosta
Title: How to access hundreds of statistics textbooks freely and legally
Abstract: Do you live, work, or study in Toronto or Mississauga? If so, there are hundreds of statistics textbooks that you can access freely and legally. I will show you how to find them, plus many other courses and videos in a variety of technical and business areas. I will also share some of my favourite textbooks in biostatistics, SAS programming, R programming, and applied statistics.

Cash or e-transfer for dinner to taba.exec@gmail.com.