Robert Brown

Robert Brown

https://www.rbrown.org/ · San Francisco, CA · robert@rbrown.org

Curriculum Vitae · Resume

Education

University of California - Berkeley

Berkeley, CA

Epidemiology and Biostatisitics, MPH

Aug. 2015 —June 2017

University of California, Davis

Davis, CA

Neurophysiology and Behavior BS, Psychology BA

Sept. 2006 —June 2011


Experience

Atropos Health

Atropos Health

Machine Learning Engineer

December 2022-Present

  • Led development of ”ChatRWD”, an agentic AI application using LLMs and RAG to accelerate Real-World Evidence generation from days to minutes, eliminating human bottlenecks in healthcare research workflows
  • Architected and deployed production LLM systems for healthcare data analysis, implementing fine-tuning strategies and prompt engineering techniques to achieve domain-specific performance improvements
  • Optimized core ML production codebase achieving 6x performance gains through parallel processing and 100x memory efficiency improvements, handling large-scale healthcare datasets
  • Built end-to-end ML pipelines integrating traditional ML and LLM components for automated evidence generation in pharmaceutical and healthcare decision-making

Berlin Brands Group

BBG

Data Engineer

October 2021 — December 2022

  • Architected scalable data infrastructure using AWS services (S3, Lambda, EC2) and Snowflake, building automated ETL pipelines to process multi-source data streams and provide actionable insights for executive decision-making
  • Developed predictive revenue forecasting system using time series models (ARIMA, LSTM, Prophet) integrated into custom Django application, enabling data-driven brand acquisition decisions
  • Deployed machine learning models to production environment with monitoring and automated retraining capabilities

Los Angeles County Public Health Department

LAPHD

Data Scientist

January — October 2021

  • Created production-ready dashboards using Dash and Plotly for real-time epidemiological surveillance and Airflow-based ETL systems for whole genome sequencing data processing, serving 10+ million LA County residents
  • Implemented statistical modeling for disease trend analysis and outbreak detection, informing public health policy decisions

Insight Data Science

Insight Fellows

Health Data Science Fellow

August 2020 — November 2020

  • Built predictive churn model achieving 78% ROC AUC using advanced feature engineering and survival analysis techniques for healthcare client retention
  • Delivered data-driven recommendations to healthcare stakeholders, demonstrating ROI of ML-powered customer segmentation strategies

Alameda County Public Health Department

ACPHD

Epidemiologist II

August 2019 — September 2020

  • Led COVID-19 data operations as Data Chief for Incident Command System, managing 25+ data professionals and building county's first agent-based modeling platform for disease forecasting
  • Designed production ETL pipelines and SQL Server databases for surveillance of 70+ communicable diseases, ensuring real-time data availability for public health response and resource allocation optimization, producing quarterly disease reports and comprehensive annual surveillance reports

HIV Data-to-Care Specialist

August 2018 — August 2019

  • Managed the HIV Data-to-Care program for the county, leveraging various case management and surveillance data with integrated inter-departmental work-flows to increase the number of HIV positive residents in care

UC Berkeley Extension

UC Berkeley Data Analytics Boot Camp

Data Analytics Substitute Instructor & TA

July 2019 - July 2020

  • Taught comprehensive data science curriculum covering programming, ML, data engineering, and visualization in intensive 6-month bootcamp program

Method Data Science

Method Data Science

Resident Data Scientist

November 2018 - April 2019

  • Developed ensemble ML algorithm (77\% AUC) for hip replacement prediction and built customer segmentation dashboard using unsupervised learning, resulting in 13% customer base increase

California Rural Indian Health Board

CRIHB

Epidemiologist

October 2017 — August 2018

  • Developed state wide data collection and disease surveillance tools, IRB and data management, and wrote complex grants to secure future funding.
  • Developed databases, SQL Reports using advanced queries, technical reports, spatial data overlays and maps, performed data linkages, and produced tables and figures in R and Tableau

UCSF School of Medicine

UCSF

Statistician

June — August 2017

  • Consulted with several research teams to provide multiple high dimensional figures, and provided mathematical modeling and statistical for publications on multi-drug resistant tuberculosis (R).
  • Provided multiple imputation on survey question that was systematically missing by sourcing comparable datasets resulting in being able to use validated survey tool.

UC Berkeley School of Public Health

UC Berkeley CHL

Center for Health Leadership Fellow

Aug. 2016 — May 2017

  • Selected to participate in a three-semester leadership development program
  • Learning activities include: consulting project with non-profit agency, training workshops, needs assessment, project management and meeting facilitation

UC Berkeley School of Public Health

UC Berkeley

Graduate Student Researcher

Aug. 2016 — May 2017

  • Conducted analysis on the accuracy of population estimates of those with a genetic disorder or genetic determinants (e.g. cancer, congenital malformation, etc.) with those of elusive populations (e.g. drug addiction) using both non-Bayesian and Bayesian methods

Google

Google

Public Health Data Specialist

May-December 2016

  • Responsible for data curation, data analysis of existing public health and medical research
  • Curated, verified, and provided expert feedback to improve the information in the health knowledge graph and related algorithm performance

San Francisco Department of Public Health

UC Berkeley

Research Data Analyst

May-December 2016

  • Conducted various statistical analyses, mathematical modeling, data management, and data interpretation on over four studies involving randomized control trials of pharmacologic interventions to treat substance use, HIV risk, and a prospective longitudinal study among transgender youth

Amity Foundation

Amity Foundation

Program Coordinator

February - August 2015

  • Managed reporting, program evaluation and program implementation for grant that provided mentors to formerly incarcerated individuals
  • Implemented program plan, maintained program management efforts for a financial literacy grant entitled, Financial Empowerment
  • Managed reporting, program evaluation and program implementation for grant that provided mentors to formerly incarcerated individuals

Native American Health Center

Native American Health Center

Data Coordinator

February 2014 - February 2015

  • Managed data collection, entry and analysis for health clinic that saw over 15 thousand patients at three sites in the San Francisco Bay Area. Supervised two Data Assistants
  • Responsible for creating clinical and behavioral health reports via SQL from our EHR service to track funding grants requirements, and provide general reporting for the board of directors, clinicians, staff, and the public
  • Provided evaluation and project management on a technology integration grant that helped fund the creation of the clinics EHR

Data Assistant

June 2012 - January 2014

  • Responsible for grant funded survey data collection, health behavior quantative analysis, and technical and statistical reporting of over 15 grants
  • Other roles included: consenting survey participants, determining insurance eligibility, and performing HIV rapid tests and HIV counseling

UC Davis Department of Physiology and Biology

Gomes Lab

Research Assistant

September 2008 - June 2011

  • Collected, maintained, and evaluated data on mutations of cardiac Troponin T and cardiac Troponin C and their relation to familial cardiomyopathies. Proteins were expressed in KO mice using in vitro techniques and structurally analyzed via homology modeling
  • Performed various molecular biology techniques such as gel electrophoresis, and western blots to determine relative levels of Ubiquitin levels and proteasome activity expressed KO mice

Skillset Overview

  • Programming and ML Frameworks : Python (PyTorch, TensorFlow, scikit-learn, Pandas, NumPy), SQL, R, PySpark
  • LLM and AI Tools : LLMs (Fine-tuning, RAG, prompt engineering, LangChain, LangGraph, MCP, Hugging Face Transformers, Arize
  • Cloud and Infrastructure : AWS (SageMaker, Lambda, EC2, S3, OpenSearch), GCP, MLflow, Airflow, Docker, CI/CD
  • Data Engineering : ETL pipelines, real-time data processing, API integration, database design
  • ML Specializations : NLP, time series forecasting, survival analysis, A/B testing, ensemble methods, neural networks

Projects

Awards & Certs

Patents

"Systems and Methods for Automated Evidence Generation" - US Patent Application 20250078969 A1 (Filed 8/29/2023)

Publications

Low, Y. S., Jackson, M. L., Hyde, R. J., Brown, R. E., et al. "Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems." DIGITAL HEALTH, vol. 11, May 2025, p. 11. SAGE Publications, doi:10.1177/20552076251348850.

Abstract

Objective: The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-world data.
Materials and Methods: We submitted 50 clinical questions to five LLM-based systems: OpenEvidence, which uses an LLM for retrieval-augmented generation (RAG); ChatRWD, which uses an LLM as an interface to a data extraction and analysis pipeline; and three general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini 1.5 Pro). Nine independent physicians evaluated the answers for relevance, quality of supporting evidence, and actionability (i.e., sufficient to justify or change clinical practice).
Results: General-purpose LLMs rarely produced relevant, evidence-based answers (2–10% of questions). In contrast, RAG-based and agentic LLM systems, respectively, produced relevant, evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. OpenEvidence produced actionable results for 48% of questions with existing evidence, compared to 37% for ChatRWD and <5% for the general-purpose LLMs. ChatRWD provided actionable results for 52% of questions that lacked existing literature compared to <10% for other LLMs.
Discussion: Special-purpose LLM systems greatly outperformed general-purpose LLMs in producing answers to clinical questions. Retrieval-augmented generation-based LLM (OpenEvidence) performed well when existing data were available, while only the agentic ChatRWD was able to provide actionable answers when preexisting studies were lacking.
Conclusion: Synergistic systems combining RAG-based evidence summarization and agentic generation of novel evidence could improve the availability of pertinent evidence for patient care.

Jackson ML., Gombar S, Manickam R, Brown R, Tekumalla R, Low Y. Validating a clinical informatics consulting service using negative control reference sets. Poster presented at: 2023 OHDSI Symposium; October 20, 2023; East Bruswick, New Jersey.

Abstract

Context: Patients with culture-negative pulmonary TB (PTB) can face delays in diagnosis that worsen outcomes and lead to ongoing transmission. An understanding of current trends and characteristics of culture-negative PTB can support earlier detection and access to care.
Objective: Describe epidemiology of culture-negative PTB.
Design, setting, participants: We utilized Alameda County TB surveillance data from 2010 to 2019. Culture-negative PTB cases met clinical but not laboratory criteria for PTB per US National Tuberculosis Surveillance System definitions. We calculated trends in annual incidence and proportion of culture-negative PTB using Poisson and weighted linear regression, respectively. We further compared demographic and clinical characteristics among culture-negative versus culture-positive PTB cases.
Results: During 2010-2019, there were 870 cases of PTB, of which 152 (17%) were culture-negative. The incidence of culture-negative PTB declined by 76%, from 1.9/100 000 to 0.46/100 000 (P for trend <.01), while the incidence of culture-positive PTB reduced by 37% (6.5/100 000 to 4.1/100 000, P for trend =.1). Culture-negative PTB case-patients were more likely than culture-positive PTB case-patients to be younger (7.9% were children <15 years old vs 1.1%; P < .01), recent immigrants within 5 years of arrival (38.2% vs 25.5%; P < .01), and have a TB contact (11.2% vs 2.9%; P < .01). Culture-negative PTB case-patients were less likely than culture-positive PTB case-patients to be evaluated because of TB symptoms (57.2% vs 74.7%; P < .01) or have cavitation on chest imaging (13.1% vs 38.8%; P < .01). At the same time culture-negative PTB case-patients were less likely to die during TB treatment (2.0% vs 9.6%; P < .01).
Conclusions: The incidence of culture-negative PTB disproportionately declined compared with culture-positive TB and raises concern for gaps in detection. Expansion of screening programs for recent immigrants and TB contacts and greater recognition of risk factors may increase detection of culture-negative PTB.

Chen J, Marusinec R, Brown R, Shiau R, Jaganath D, Chitnis AS. Epidemiology of Culture-Negative Pulmonary Tuberculosis-Alameda County, 2010-2019. J Public Health Manag Pract. 2023 Mar 3. doi: 10.1097/PHH.0000000000001715. Epub ahead of print. PMID: 36867649.

Abstract

Context: Patients with culture-negative pulmonary TB (PTB) can face delays in diagnosis that worsen outcomes and lead to ongoing transmission. An understanding of current trends and characteristics of culture-negative PTB can support earlier detection and access to care.
Objective: Describe epidemiology of culture-negative PTB.
Design, setting, participants: We utilized Alameda County TB surveillance data from 2010 to 2019. Culture-negative PTB cases met clinical but not laboratory criteria for PTB per US National Tuberculosis Surveillance System definitions. We calculated trends in annual incidence and proportion of culture-negative PTB using Poisson and weighted linear regression, respectively. We further compared demographic and clinical characteristics among culture-negative versus culture-positive PTB cases.
Results: During 2010-2019, there were 870 cases of PTB, of which 152 (17%) were culture-negative. The incidence of culture-negative PTB declined by 76%, from 1.9/100 000 to 0.46/100 000 (P for trend <.01), while the incidence of culture-positive PTB reduced by 37% (6.5/100 000 to 4.1/100 000, P for trend =.1). Culture-negative PTB case-patients were more likely than culture-positive PTB case-patients to be younger (7.9% were children <15 years old vs 1.1%; P < .01), recent immigrants within 5 years of arrival (38.2% vs 25.5%; P < .01), and have a TB contact (11.2% vs 2.9%; P < .01). Culture-negative PTB case-patients were less likely than culture-positive PTB case-patients to be evaluated because of TB symptoms (57.2% vs 74.7%; P < .01) or have cavitation on chest imaging (13.1% vs 38.8%; P < .01). At the same time culture-negative PTB case-patients were less likely to die during TB treatment (2.0% vs 9.6%; P < .01).
Conclusions: The incidence of culture-negative PTB disproportionately declined compared with culture-positive TB and raises concern for gaps in detection. Expansion of screening programs for recent immigrants and TB contacts and greater recognition of risk factors may increase detection of culture-negative PTB.

Lloyd T, Bender M, Huang S, Brown R, Shiau R, Yette E, Shemsu M, Pandori M. Assessing the Use of PCR To Screen for Shedding of Salmonella enterica in Infected Humans. J Clin Microbiol. 2020 Jun 24;58(7):e00217-20. doi: 10.1128/JCM.00217-20. PMID: 32376667; PMCID: PMC7315023.

Abstract

Recovery from enteric bacterial illness often includes a phase of organismal shedding over a period of days to months. The monitoring of this process through laboratory testing forms the foundation of public health action to prevent further transmission. Regulations in most jurisdictions in the United States exclude individuals who continue to shed certain organisms from sensitive occupations and situations, such as food handling, providing direct patient care, or attending day care. The burden that this creates for recovering patients and their families/coworkers is great, so any effort to provide efficiency to the testing process would be of significant benefit. We sought to assess the ability of PCR for the detection of Salmonella enterica shedding and to compare that ability to culture-based testing. PCR would be faster than culture and would allow results to be generated more quickly. Herein, we show data that indicate that, while PCR and culture testing agree in the majority of cases, there are incidents of discordance between the two tests, whereupon PCR shows positive results when culture indicates lack of detectable viable organisms. Using culture-based testing as the standard, the negative predictive value of PCR was found to be 100%, while the positive predictive value was 79%. The nature of this discordance is briefly investigated. We found that it is possible that PCR may not only detect nonviable organisms in stool but also viable organisms that remain undetectable by standard culture methods.

Allgeier D, Gebreegziabher E, Ycasas J, Murgai N, Brown RE, Moss N. HIV in Alameda County, 2015-2017. 2018.

Brown RE, Turner C, Hern J, Santos GM. Partner-level substance use associated with increased sexual risk behaviors among men who have sex with men in San Francisco, CA. Drug Alcohol Depend. 2017;176:176–80.

Abstract

BACKGROUND
Substance use is highly prevalent among men who have sex with men (MSM) and is associated with individual-level sexual risk behaviors. However, few studies have explored the relationship between substance use and HIV risk behaviors within partnerships.
METHODS
We examined partner-level data between MSM participants (n = 23) and their sexual partners (n = 52). We used multivariable generalized estimating equations (GEE) logistic regression to assess the relationship between partner-level substance use during their last sexual encounter with each partner, and engaging in condomless anal intercourse (CAI) and serodiscordant CAI.
RESULTS
In multivariable analyses, participants had significantly higher adjusted odds ratio (AOR) of CAI when the participant (AOR = 22.2, 95%CI = 2.5-199.5) or their partners used any drugs (AOR = 21.8, 95%CI = 3.3-144.3); their partners (AOR = 5.7, 95%CI = 1.7-19.3) or both participant and partner had concordant use of methamphetamine (AOR = 10.5, 95%CI = 2.2-50.6); or when both used poppers (AOR = 11.4, 95%CI = 1.5-87). There were higher odds of SDCAI if the participant binge drank (AOR = 4, 95%CI = 1.01-15.8), used more than one substance (AOR = 15.8, 95%CI = 1.9-133), or used other drugs (AOR = 4.8, 95%CI = 1.3-18.4); if their partner used poppers (AOR = 7.6, 95%CI = 1.5-37.6), or used more than one substance (AOR = 7.9, 95%CI = 1.9-34.1); and when both participant and partner had concordant use of poppers (AOR = 4.4, 95%CI = 1.2-16.8).
CONCLUSIONS
This study observed significant relationship between substance use and HIV risk behaviors within partnerships. Specifically, when either the participant, the partner, or both used any drugs there was an increased odds of sexual risk behaviors. Findings suggest that partner-level substance use behaviors should be taken in account when developing sexual risk reduction interventions.