Confirmed Speakers and Panelists

Jimeng Sun

University of Illinois Urbana-Champaign

Parminder Bhatia


Jure Leskovec

Stanford University

Brett Beaulieu-Jones

Harvard Medical School

Summers, Ronald

NIH Clinical Center

Mary Regina Boland

University of Pennsylvania

Jiajie Zhang

University of Texas Health Science Center

R. Nick Bryan

The University of Texas at Austin

Beau Norgeot

Li Li


Marina Sirota

UC San Francisco

Keshav Pingali

Katana Graph

Amit Sheth

University of South Carolina

Kim Branson


Pratik Shah

MIT Media Lab

Shameer Khader


Keynote 1 - Jure Leskovec

Mobility network models of COVID-19 explain inequities and inform reopening

ABSTRACT: The COVID-19 pandemic dramatically changed human mobility patterns, necessitating epidemiological models which capture the effects of changes in mobility on virus spread. In this talk we will introduce a metapopulation SEIR model that integrates fine-grained, dynamic mobility networks to simulate the spread of SARS-CoV-2 in 10 of the largest US metropolitan statistical areas. Derived from cell phone data, our mobility networks map the hourly movements of 98 million people from neighborhoods (census block groups, or CBGs) to points of interest (POIs) such as restaurants and religious establishments, connecting 57k CBGs to 553k POIs with 5.4 billion hourly edges. We show that by integrating these networks, a relatively simple SEIR model can accurately fit the real case trajectory, despite substantial changes in population behavior over time. Our model predicts that a small minority of "superspreader" POIs account for a large majority of infections and that restricting maximum occupancy at each POI is more effective than uniformly reducing mobility. Our model also correctly predicts higher infection rates among disadvantaged racial and socioeconomic groups solely from differences in mobility: we find that disadvantaged groups have not been able to reduce mobility as sharply, and that the POIs they visit are more crowded and therefore higher-risk. By capturing who is infected at which locations, our model supports detailed analyses that can inform more effective and equitable policy responses to COVID-19.

Jure Leskovec is an associate professor of Computer Science at Stanford University, the Chief Scientist at Pinterest, and an Investigator at the Chan Zuckerberg Biohub. He co-founded a machine learning startup Kosei, which was later acquired by Pinterest. Leskovec's research area is machine learning and data science for complex, richly-labeled relational structures, graphs, and networks for systems at all scales, from interactions of proteins in a cell to interactions between humans in a society. Applications include commonsense reasoning, recommender systems, social network analysis, computational social science, and computational biology with an emphasis on drug discovery. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper and test of time awards. It has also been featured in popular press outlets such as the New York Times and the Wall Street Journal. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, PhD in machine learning from Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter at @jure.

Keynote 2 - Pratik Shah

Regulatory landscape and societal impact of clinical decision making and therapeutic development driven by AI technologies

ABSTRACT: Future of clinical and biomedical research is undergoing a major transformation due to convergence of large new digital datasets, computing power to identify clinically and biologically meaningful inferences using explainable and interpretable AI and deep-learning models, and universities embracing this change through new research efforts. However, uncertainties in regulatory requirements, risk aversion, and skepticism about rapidly emerging, yet largely unproven, technologies (such as deep learning), and lack of testable hypotheses and statistical benchmarking of novel computational evidence for clinical decision making remains a major barrier for adoption. Co-development of efficient, fair, interpretable and translational computer science models and statistical methods will result in generating fundamental insights in biology to help physicians and researchers increase health and longevity of patients. This talk will provide an overview and strategic insights for benchmarking novel computational methods + a proposed clinical process for validation of Software as a Medical Device (SaMD) to de-risk and explain real world evaluation of deep learning models and evidence generated by them. Current status of real-world evaluations using clinical trials, and strategies for regulation and de risking of emerging technologies and their positive and negative impact on patients will be one of the key learning outcomes of the talk.

Pratik Shah is a Principal Investigator leading the Health 0.0 research lab at MIT that creates novel intersections between engineering, medical imaging, machine learning, and medicine to improve health and diagnose and cure diseases. Key goals are: 1) Novel medical technologies for translational clinical and biomedical research and real world impact; 2) Augmenting artificial intelligence (AI), machine learning , medical imaging and neural network capabilities for personalized digital medicines and improving health outcomes; and 3) Empowering patients, physicians, researchers, and regulators for making informed healthcare decisions. Recent work from his lab has been published in Nature Digital Medicine, Cell press, Journal of American Medical Association, and workshops at Proceedings of National Academies of Science Engineering and Medicine. Pratik serves on the grant reviewer board of Center for Scientific Review at National Institutes of Health and foundations supporting patient-centric research. Pratik has BS, MS, and Ph.D. degrees in biological sciences and completed fellowship training at Massachusetts General Hospital, the Broad Institute of MIT and Harvard, and Harvard Medical School.

Keynote 3 - Shameer Khader

Improving clinical trial enrollment lifecycle and patient engagement using big data and machine intelligence

ABSTRACT: One of the significant hurdles to successful drug development is the complexity, cost, and scale of clinical trials. Despite a plethora of historical data in the public domain, clinical trial sponsors typically have difficulty fully leveraging historical trial data repositories to optimize clinical trial design, cost, and scale. Many impediments exist to leveraging this data, including differences in trial structure and differences in data sampled. We have developed a series of predictive models to understand different factors driving the clinical trial enrollment lifecycle to address such emerging challenges. Using data mining techniques that leverage recent advances in machine intelligence, we have developed tools (SAEgnal) and algorithms (ClinicalTrials2Vec) to understand drivers of clinical trial facets and perform scalable computing using clinical trial results. Further, drug development teams could use these insights to improve patient engagement and reduce side effects.

Dr. Shameer Khader is currently working as a Senior Director of Data Science and Artificial Intelligence at AstraZeneca, USA. He leads a global team that focuses on leveraging trans-disciplinary (biomedical, healthcare, and clinical) big data and machine intelligence to accelerate drug discovery and development. He has more than a decade of experience in building and leading bioinformatics and data science in both academia and industry. He obtained his Ph.D. in computational biology from the National Center for Biological Sciences in India. He completed his post-doctoral training in computation genomics and precision medicine from Mayo Clinic, Rochester, MN. He has published more than 70 peer-reviewed research papers in the areas of healthcare data science, bioinformatics, drug discovery, and precision medicine. His work was featured in media outlets including Forbes, Fast Company, Bloomberg News, and Times of India. He received multiple awards for his research contributions; His work on developing an open catalog of drug repositioning has won the Swiss Institute of Bioinformatics' Bioinformatics Resource Innovation Award in 2017. Recently, he was recognized as one of the 100 Artificial Intelligence Leaders in Drug Discovery & Healthcare (DKI Global and Forbes). His TEDx Talk on Saving Lives Using Biomedical Data Science is available from

Keynote 4 - Parminder Bhatia

Amazon CORD-19 Search: A Neural Search Engine and Knowledge Graph for COVID-19 Literature

ABSTRACT: With the global outbreak of Coronavirus, people, especially the biomedical researchers, have a lot covid-19 related questions. At the same time, there is a lot of research focusing on coronavirus, and the publications expands at a rapid rate. The CORD-19 dataset is a collection of over 100,000 covid19 scientific articles that is available to research community to fight against coronavirus. It aims to connect the machine learning community with biomedical domain experts and policy makers to identify effective treatments and management policies for COVID-19. The goal is according to this initiative, and to provide a scalable solution to access COVID-19 information easily using advanced NLP techniques. For example, people may have questions like: “What are the recommended medications for COVID-19?” In order to retrieve answers and relevant information for these questions, we require a search engine with a strong biomedical understanding of these natural language questions.

Parminder Bhatia is a science leader in the AWS Health AI leading AWS Comprehend Medical and Healthlake, currently building deep learning algorithms for clinical domain at scale. Parminder has authored several papers in ACL, EMNLP, AAAI and several other top tier ML conferences. His expertise is in machine learning and large scale text analysis techniques in low resource settings, especially in biomedical, life sciences and healthcare technologies. Prior to joining Amazon he worked at Microsoft and several startups developing conversational models. He has expertise in building and deploying medical NLP systems in clinical and conversational settings at scale.

Keynote 5 - Ronald Summers

Creating Large Radiology Image Datasets for Deep Learning: NIH Experience

ABSTRACT: The recent revolution in machine learning has led to dramatic advances in the automated interpretation of radiology images. For progress in the field to reach its true potential, large labeled radiology image datasets are currently required. In this talk, I will describe my lab’s approach to creating large radiology datasets for deep learning research. I will also briefly touch on other approaches that may need less data, particularly less labeled data.

Dr. Summer is now a tenured Senior Investigator and Staff Radiologist. He is a Fellow of the Society of Abdominal Radiologists and of the American Institute for Medical and Biological Engineering (AIMBE). Dr. Summer's research interests include deep learning, virtual colonoscopy, CAD and development of large radiologic image databases. His clinical areas of specialty are thoracic and abdominal radiology and body cross-sectional imaging.

Keynote 6 - Jiajie Zhang

AI is to Medicine Today What the X-ray was to Medicine a Century Ago, and Much More

ABSTRACT: Just as the X-ray machine invented more than a century ago enables doctors to see images of structures inside the human body, recent breakthroughs in AI and machine learning are enabling doctors to not only see, but predict, previously unidentified patterns within medical and biological data that can inform individualized disease prevention and care, as well as biomedical discovery. For many clinical tasks, AI can often outperform—in speed and accuracy—trained clinicians. However, at this time only a tiny percentage of ML algorithms developed for medical applications have been technically tested, clinically validated, and operationally deployed in real clinical settings. This presentation will discuss the challenges and opportunities in bringing AI to the real medical world.

Dr. Jiajie Zhang is Dean, Professor, and Glassell Family Foundation Distinguished Chair in Informatics Excellence at the School of Biomedical Informatics at the University of Texas Health Science Center at Houston. Since appointed as the Dean in 2013, he has led a major growth for the school, tripling both faculty and student bodies. His research focuses on biomedical informatics, cognitive science, machine learning, and human technology integration in healthcare, with numerous publications and grants/contracts. Dr. Zhang was a recipient of John P. McGovern Outstanding Teacher Award from UTHealth, a recipient of the George H. W. Bush Award from the Asian Pacific American Heritage Association, and a Fellow of American College of Medical Informatics, a Fellow of American Medical Informatics Association, and a Fellow of International Academy of Health Sciences Informatics. Dr. Zhang received his PhD and MS in Cognitive Science from the University of California at San Diego and his BS in biological sciences from the University of Science and Technology of China. Dr. Zhang's current focus is on strategic thinking about transforming healthcare and biomedical discovery through informatics, data science, and artificial intelligence.

Keynote 7 - Amit Sheth

Augmented Personalized Health: dHealth approach to patient empowerment for managing chronic disease burden

ABSTRACT: Healthcare as we know it is in the process of going through a massive change - from episodic to continuous, from disease-focused to wellness and quality of life focused, from clinic centric to anywhere a patient is, from clinician controlled to patient empowered, and from being driven by limited data to 360-degree, multimodal personal-public-population physical-cyber- social big data-driven. While the ability to create and capture data is already here, the upcoming innovations will be in converting this big data into smart data through contextual and personalized processing such that patients and clinicians can make better decisions and take timely actions. The exploitation of all relevant data, relevant medical knowledge, and AI techniques will extend and enhance human health and well-being.  Augmented Personalized Healthcare (APH) strategy as we have defined involves empowering patients with self-monitoring (collecting relevant data), self-appraisal (interpreting data in the patient's context), self-management (assisting the patient in following personalized care plan to maintain health), to intervention (when the clinical help is needed) and disease progression tracking and prediction (, While we have early investigations for several diseases, we will share some experience (such as developing a digital phenotype) from pediatric asthma that involved an evaluation with ~200 patients (

Prof. Amit Sheth is an Educator, Researcher, and Entrepreneur. He is the founding director university-wide AI Institute at the University of South He is a Fellow of IEEE, AAAI, AAAS and ACM. He has (co-)founded four companies, three of them by licensing his university research outcomes, including the first Semantic Search company in 1999 that pioneered technology similar to what is found today in Google Semantic Search and Knowledge Graph. He is particularly proud of the success of his 45 Ph.D. advisees and postdocs in academia, industry research, and entrepreneurs.

Keynote 8 - Li Li

EMR Data-Driven Digital Health towards Outcomes for Women Health

ABSTRACT: While there is a growing volume of literature regarding the use of AI to better predict and characterize a number of medical conditions, there has been a paucity of work devoted specifically to women’s health and particularly obstetrics. Pregnancy complication such as preeclampsia and postpartum hemorrhage contributes substantially to maternal morbidity and maternal mortality worldwide and within the US. Preeclampsia can vary not only in severity, but also in timing of onset and impact on fetal growth. Women are routinely screened for preeclampsia at the first prenatal visit using clinical factors however delivery is currently the only recognized treatment. The majority of postpartum hemorrhage caused deaths are preventable, but a primary cause is error or delay in diagnosis and treatment. The risk stratification tools used in current standard of care have very limited clinical utility. Therefore, we presented phenotyping algorithms for both pregnancy complications as well characterizing the clinical features that provide the most information for making accurate risk assessments for both complications. We believe that our work not only has the potential to impact the clinical standard of care, but to spur further research as well that may ultimately lead to a host of better screening tools that result in a meaningful reduction in preventable maternal mortality for all pregnant women.

Dr. Li Li is VP of Clinical Informatics at Sema4 and Assistant Professor at Icahn School of Medicine at Mount Sinai. She leads the team that focuses on learning on real world data to drive the development and improvement of clinical applications for reproductive diseases, newborn rare disease, COVID drug repurposing, imaging learning, aim to advance novel diagnostics, therapeutics, and provide insights to improve outcome. Dr. Li established the precision medicine groundwork for treatment stratification rather than one size fits all model, and her studies were featured by NIH director Francis Collins with numerous media reports and awards. She developed RDOCK, which has become one of the most commercially widely used protein docking software. She holds four international patents and three of them led to the creation of the start-up company then acquired by IMMUCOR Inc. Dr. Li was trained as both a physician and bioinformatician, with an M.D. from Dalian Medical School, China, an M.S. in Bioinformatics from Boston University, and postdoctoral at UCSD. Dr. Li has over 18 years of experience and published in the fields of precision medicine and bioinformatics with more than 85 peer-reviewed papers in journals including Nature Biotechnology, Nature Methods, and Science Translational Medicine.


In April 2004, President Bush set a national goal that most Americans should have electronic health records (EHR) within a decade, with President Obama’s signing of the Health Information Technology for Economic and Clinical Health (HITECH) as part of the American Recovery and Reinvestment Act in February 2009, a whopping 96% of US hospitals have EHRs even back to 2016. Web technologies have played a critical role for this paradigm shift that EHRs can be transferred and communicated via the Web. The humongous healthcare data not only poses new challenges on data semantics to enable seamless communications among health professionals, but also creates huge potentials to allow Artificial Intelligence (AI) to empower doctors to provide better care. Data exchange within and across different healthcare organizations require explicit and shared semantics that biomedical ontologies and controlled vocabularies (e.g., SNOMED CT, ICD9, LOINC, RxNorm) have been widely implemented in the clinical decision-making systems. Aggregating heterogeneous healthcare data becomes doable which is essential for evidence-based care. Doctors can now make evidence-based care decision by not only having the current medical measures and the previous history of medical records of a patient, but also the EHR records of other similar patients from the current health organization and even other hospitals. Computational biomarkers can be identified by mining integrated EHR data using cutting edge machine learning and deep learning algorithms. With the benefits of formal data semantics and rich knowledge encoded in the biomedical ontologies, healthcare has entered a new era of personalized precision medicine.

The Web Conference gathers top notch experts in data management, data analytics, web technologies, semantic web, artificial intelligence, computer vision, and applied areas. This topic is important to the Web Conference as it is addressing a fundamental issue and an applied field related to intelligent data representation, mining, and application. Recent breakthroughs in artificial intelligence and machine learning have demonstrated the promising potential of machine intelligence, especially the combination of machine and human intelligence, which can lead to a paradigm shift in healthcare industry in the near future. This workshop aims to explore this timely topic with the audience from the Web Conference because the Web has become the essential infrastructure to acquire, disseminate, and create data, information, and knowledge. The Web Conference has a broad audience from both the technical and the social sides of science. This unique combination makes the Web conference an ideal forum for this workshop because healthcare is deeply rooted in science and also social science and humanity.

The rich medical concepts connected by semantic relationships integrate EHR data into knowledge graphs to enable knowledge-intensive discoveries. But it is still an open field with lots of challenges. For example, data cannot be easily shared across different hospital systems due to privacy, security, and policy issues. Especially, EHRs are embedded in different commercial vendor systems which makes the integration of EHRs extremely troublesome. But the recent development of FHIR and FAIR standards tackled this problem from a different angle that data can be communicated, integrated and analyzed simultaneously not only for physicians, but also available at the patient side. Biomedical ontologies and semantic web technologies can empower knowledge-driven discovery in healthcare to enable better cohort identification for clinical trials, risk prediction, precision diagnosis, and efficient clinical decision support workflows. Even though the dramatic increase of healthcare data offers unprecedented opportunities for evidence-based care, the interoperability of EHRs and mining the integrated EHRs are still open to innovative solutions. In this workshop, we will welcome researchers from various domains to discuss and share latest progresses related to knowledge representation, semantic annotation, semantic mining, automatic reasoning, and semantic data management to promote innovative semantic approaches to address pressing needs in healthcare.

Artificial Intelligence is revolutionizing every aspect of our lives. It also sneaks into the radiology reading rooms to build a new paradigm for precision diagnosis. Health innovations applying machine learning (ML) and deep learning (DL) in radiology account for more than half of the total innovations in health. The shortage of radiologists and burnout of physicians create the urgent demand for immediate solutions. A radiologist reads about 20,000 images a year, roughly 50-100 per day and the number is increasing. US each year produces 600 billion images and 31% of American radiologists have experienced at least one malpractice claim, often missed diagnoses. Building automatic or semi-automatic approaches on medical imaging diagnosis becomes the unavoidable next step. The combination of deep learning and prior knowledge of physicians organized as knowledge graphs can provide a powerful and yet unified framework for clinical decision support. It will open a new door to the potential of auto-annotating medical images by using AI and knowledge graph powered approaches. It can abruptly increase the annotated medical images at a much lower cost so that better CNN models can be trained, therefore better diagnosis models can be obtained. It can increase the interpretability of AI solutions by locating the abnormalities as the visual evidence in medical images which can build the trust between doctors and patients. In this workshop, we will welcome researches from various domains of computer vision, deep learning, knowledge graph, deep graph mining, and natural language processing to share latest developments of AI powered medical imaging diagnosis and move the research agenda to visual question answering to enable interpretability and precision in care.


(US Central Time, April 16)

The workshop will be open for the whole conference. Each submitted paper will be evaluated by three reviewers from the aspects of novelty, significance, technique sound, experiments, and presentations. The reviewers will be program committee members or researchers recommended by the members.

All papers submitted should have a maximum length of 8 pages and demo papers should be no more than 4 pages. All must be prepared using the ACM camera-ready template. Authors are required to submit their papers electronically in PDF format.

Please submit your papers at

Topics (not limit to)

Knowledge representation and reasoning on healthcare data

Data integration, ontology and standards for healthcare data

Knowledge graph construction on healthcare data

Deep graph mining to address precision care

Biomedical ontology and Semantic Web technological applications in healthcare

Computer vision in medical imaging diagnosis

Auto-annotation of medical images

NLP for medical diagnosis notes

Multimodal deep learning models for advanced diagnosis

Interpretability of AI in health

Fairness of AI in health

AI applications in healthcare

Important Dates

(all deadlines are midnight Ljubljana Time)

Feb 15, 2021

Submissions due

Feb 22, 2021

Acceptance notifications

March 1, 2021

Camera-ready submission

April 16, 2021

Workshop date


Ying Ding

University of Texas at Austin

James Hendler


Benjamin Glicksberg

Icahn School of Medicine at Mount Sinai

Guoqiang Zhang

University of Texas Health Science Center

Yifan Peng

Cornell University

Mark Musen

Stanford University

Fei Wang

Cornell University

Marinka Zitnik

Harvard University