2008 Melbourne Information Seminar Speaker Information E-mail

Richard Cotton: Welcome and Introduction

Prof. Richard G.H. Cotton
Head, Genomic Disorders Research Centre
Howard Florey Institute
Convenor, Human Variome Project
President, Human Genome Variation Society
Co-Editor, Human Mutation

The Human Variome Project - Progress, Pilots and Plans
The Human Variome Project was initiated to increase the profile of inherited disease in June 2006 with the highest possible representation of geneticists, international bodies (EC, UNESCO, OECD, WHO), 20 countries and 20 genetics journals (www.humanvariomeproject.org/?p=2006_meeting). The meeting generated 96 recommendations (www.nature.com/ng/journal/v39/n4/full/ng2024.html) and supported coordination in Melbourne.
A number of projects have been initiated as pilot studies to guide future work. These include the HVP/InSiGHT collaboration to pilot the flow of phenotype and genotype data through local databases to central databases/browsers; microattirbutioncollection from individual countries, adopt a gene funding, placing Locus Specific DataBase (LSDB) data on central databases and browsers, disease specific form, Ethics, Pathogenicity of variants, somatic mutation databases, GEN2PHEN, etc.
We have a HVP planning meeting in Spain (www.humanvariomeproject.org/HVP2008/) where we hope to collect as many as possible of relevant systems, plans and ideas for examination and possible inclusion in future plans. The generated plan will be written up in a major journal with all attendees and others as authors. We anticipate a report of this meeting to be assessed by the meeting.

Keynote Speaker: Myles Axton Dealing with variant publication
Editor, Nature Genetics

Myles Axton has a degree in genetics from Cambridge University and got his Ph.D. from Imperial College, London in David M. Glover's cell cycle genetics research group. He continued with postdoctoral research at the University of Dundee, and then moved to the Whitehead Institute for Biomedical Research to study DNA replication in Drosophila development with Terry L. Orr-Weaver. From 1995 to 2003 he was a University Lecturer in Molecular and Cellular Biology at the University of Oxford and a Tutorial Fellow of Balliol College. He joined Nature Genetics in 2003.

John Hopper: Importance of accurate collection of data for inherited disease and common disease for research and treatment
The question of critical clinical importance is: what are the disease risks associated with particular genetic variants? Determining these is no simple matter. Errors can and have been made due to misunderstandings about: (a) the importance of adjusting for ascertainment when dealing with family data, and the need to use a valid statistical methods; (b) the people to which valid inference is being made (given that risks may vary according to a variety of environmental and genetic modifiers); (c) errors, and imprecision, in estimates due to selective reporting and/or limited sample sizes.
Population-based studies are often the best resource to minimise the problems above, but even then they must be analysed appropriately and interpreted carefully. Epidemiology provides a solid theoretical basis and valid tools, but most researchers interpreting the data lack training in this discipline. A multi-disciplinary approach is needed to provide valid and reliable information about the consequences of genetic variants. Large and informative studies, including population-based family studies, will be essential to support this enterprise. Some such resources have been established for some major diseases, and use should be made of the experiences gained in their establishment, maintenance, and utilisation.

Ingrid Winship: The Clinic - The ideal system
The use of genetic information in the prediction of familial disorders provides an opportunity for early intervention strategies. Technology therefore affords the opportunity to promote wellness even in high risk families. It is therefore essential that genetic variations are well understood, so that their utility can be fully explored, without risk of preventable error.
This requires comprehensive data collection, with rational correlation of genotype and phenotype, so that clinicians can refer quickly and easily to a valid up-to-date information resource. I will illustrate the clinical scenario, where effective databases facilitate clinical efficiency and a paucity of data places patients at clinical risk.

Agnes Bankier: Web based resources for the clinical geneticist

Professor Agnes Bankier
Director
Genetic Health & VCGS Pathology
Murdoch Childrens Research Institute
Flemington Road, Parkville, Vic. 3052

Web-based communication and learning has become part of clinical practice and ongoing professional development. Clinicians use web-based databases like OMIM, PubMed, GENETests, Gene Reviews, Teratology, electronic journals for efficient information retrieval. They also use specialised search engines of syndrome specific databases: web-based POSSUM Web as well as stand alone database LDDB. The clinicians' expectation is immediate access to up-to-date information and when a diagnosis is still not made, emails and chat-sites for diagnostic advice. Whilst syndrome diagnosis is still an art learnt by apprenticeship and experience, web-based learning can enhance and speed up that process. More web-based interactive resources will be needed.
The electronic age had provided 3-D digital photography and imaging and face-recognition software (developed for law enforcement and forensics as well as research), with the potential of being adapted to syndrome recognition. Syndrome databases and patient databases could also be used by researchers once access barriers are overcome.

Desiree du Sart: The Diagnostic Lab - The ideal system

Molecular Genetics Laboratory
Victorian Clinical Genetics Services
Murdoch Childrens Research Institute
Parkville, VIC 3052, Australia

One of the most significant outcomes from the Human Genome project is the translation of the genome data into clinical practice. Identification of genes which cause inherited genetic disorders impacts on clinical diagnosis of affected patients and clinical management of the family members who may be at-risk of also being affected, or passing mutated genes on to offspring. Molecular Genetic Diagnostic Laboratories are involved in this process of testing affected patients and at-risk family members. Consequently, these laboratories collect a huge amount of data about the genes that they analyse in many different patients. This is data about sequence variation present in the human genome and because the patients tested would be presenting a clinical phenotype, there is also data on the impact or effect of that specific change. One of the vital aims of the Human Variome Project is to change the current modus operandi within diagnostic services to facilitate the collection of gene variation data and related clinical information to a central data collection centre which is managed in the appropriate ethical and confidential manner. So the final step in the laboratory process is not the report to the clinician ordering the test, but instead becomes sending data to the central repository. How can this be done? It is agreed that a quick and efficient process is essential to achieve maximum acceptance of this process. A number of possibilities will be discussed. Why should this be done? Diagnostic laboratories need to interpret the significance of sequencing data obtained when screening genes in patients. They rely heavily on databases and publications to provide information on how to interpret the data with respect to clinical management. By submitting their data and interpretations to a central repository, they not only impact on the integrity and depth of the accessible data, but also facilitate another form of external quality assessment of their laboratory processes, which is an essential part of service delivery in any diagnostic laboratory.

David Ravine: LSDB Curation -The ideal system

Depending on the stage of development or maturity of a locus specific database, curators require a wide range of skills that encompass the creation of the LSDB, its maintenance, periodic (or continuous) upgrading and, in time, succession planning for handover to another curator. The skills required include at least expertise in the gene or genes served by the LSDB, an appropriate level of knowledge about the diseases associated with mutations in the gene concerned, fluency with computer/ server hardware, bioinformatics, database and web server software, quality control systems, ability to liaise with research and diagnostic laboratory scientists who may are submitting details about identified gene variants, serving the needs of those who are making enquiries about specific gene variants, as well as effective organisational management. As it is uncommon for any one individual to have all these skills, there may be a place for the Human Variome Project to coordinate the development and implementation of effective management guidelines, which will ensure long-term high quality governance of LSDBs so that they remain able to serve the needs of those who depend on having ready access to accurate information about specific genomic variants.

David Thorburn: Collection of data in Australia

Associate Professor David Thorburn
Head, Mitochondrial & Metabolic Research
Murdoch Childrens Research Institute
President, Human Genetics Society of Australasia

The success of the HVP is in the interests of the members of all the special interest groups represented by HGSA, namely professionals involved in clinical genetics, genetic counselling, cancer genetics, biochemical genetics, cytogenetics, molecular genetics and cancer genetics. What are the best ways in which a society such as HGSA can support implementation of the measures needed to enact the HVP?
We are open to suggestion and make some preliminary comments on what may be useful in the Australasian context. Probably the most obvious area relates to encouraging diagnostic labs to expect collection of phenotypic data and deposition of genetic variants in appropriate LSDBs as expected and routine standards of practice. There are a number of impediments to compliance that have been noted previously, such as time, resource and privacy issues as well as publication or other incentives. An effective role for HGSA may be to work internally and with other relevant groups to increase the incentives. For example, HGSA and the Royal College of Pathologists (RCPA) of Australasia provide representatives to the National Pathology Accreditation Advisory Council (NPAAC). NPAAC has guidelines for genetic testing that supplement the general requirements for accreditation of medical laboratories defined by standards such as ISO 15189 and administered by NATA and IANZ in Australia and New Zealand, respectively. Potentially we can influence these guidelines, if supported by RCPA and the Human Genetics Advisory Committee (HGAC). Other incentives may be able to be provided through Quality Assurance and Maintenance of Professional Standards (MOPS) programs administered by HGSA and RCPA.

Christine O'Keefe: Aggregating and integrating data

The InSiGHT pilot for the HVP project envisages improvements to informatics systems related to inherited colon cancer, including establishing a pilot system for the collection and curation of mutation and phenotype data. Much of the phenotype data of interest is collected routinely as individuals interact with the health system, and so resides in dispersed databases across one or many healthcare locations.
In this talk we describe the use of data linkage to assemble these otherwise dispersed phenotype data for individuals before association with the corresponding mutation data, including an example architecture. We pay particular attention to aspects of patient confidentiality and privacy.

Marienne Hibbert: BioGrid Australia - a virtual platform for multi-disease, multi-institutional research (formerly MMIM)

Marienne Hibbert
Project Director
BioGrid Australia

The BioGrid Australia is a virtual repository of clinical and genetic data sets. Physically located within independent organisations, the data are able to be integrated, searched and queried seamlessly via a federated data integrator. The BioGrid platform has solved the issues of record linking individual cases and integrating data sources across multiple institutions and multiple clinical specialities. It enables a virtual data platform for research across life science disciplines with access to genomic data and associated clinical treatment and outcome data. The infrastructure of Bio21:MMIM enables discovery research to be accessible via the Web with security, intellectual property and privacy addressed. Researchers must gain authorisation to access data, and inform/obtain permission from the data owners, before the data can be accessed. The legal and ethical issues surrounding health data have been addressed.
The Human Variome Project allows the opportunity to link the genetic and phenotypic data for multiple mutations. A proposal will be outlined which will allow collation of mutation data from multiple organisations.

Ravi Savarirayan: The Human Variome: implications for musculoskeletal disease

Many genes that cause rare musculoskeletal phenotypes have been shown recently to also predispose populations to common disease processes such as arthritis, osteoporosis, and lumbar disc degeneration. These functional polymorphisms are being uncovered and their relevance to disease determined.
This presentation will outline some of these variants and their clinical implications. It will discuss the clinical data requirements and linkages that will aid documentation of these phenotypes in the context of the many published recommendations of the Human Variome Project.

Terence Harrison: Evidence Based Medicine system - Is genetics coverage sufficient

Terence M Harrison
Clinical Librarian, Health Sciences Library
Royal Melbourne Hospital-

For many clinicians the task of searching for reports, reviews and articles that cover a genetics content can involve a huge learning curve, particularly if they have little experience of the resources available that might assist in such a task. Fortunately, help is at hand in the form of specialist search engines that provide an interface between human genetic content and EBM. Other search facilities can also be used to obtain relevant results by applying certain search techniques. This presentation outlines the resources - major and minor - that are available. These resources are explained in terms of content indexed, search techniques, and results presented.
It is hoped that the presentation will form the basis of a more detailed study by this author into the EBM-genetics interface and how the technologies involved can be improved.

Finlay Macrae: HNPCC as a model system - A HVP/InSiGHT pilot study

InSiGHT is the lead international health professional organization with an interest in familial gastrointestinal tumours. It formed after the merger of the Leeds Castle Polyposis Group, with its interest in the Polyposis Syndromes, and the International Collaborative Group for Hereditary Non Polyposis Colorectal Cancer (ICG HNPCC), with its interest in the mismatch repair deficiency syndrome of HNPCC. InSiGHT maintains a database of mismatch repair gene mutations through its website www.insight-group.org, initiated soon after the cloning of the mismatch repair genes and their recognition as the predisposition to HNPCC. This database accepts variant information from laboratories as well as the published literature, and is curated by Paivi Peltomaki in Finland.
Recognizing that the task of curation was expanding, and the need to access a range of new databases emerging which service mismatch repair mutation interpretation, InSiGHT, at its meeting in Yokohama in March 2007, agreed to explore a collaboration with the Human Variome Project, in the hope of mutual benefits. A series of committees were established at Yokohama to pursue this end: DNA curation to include both the InSiGHT database and the MMRgene database developed by Mike Woods in Newfoundland (ascertaining published variants only), a phenotype database development, a virtual histology database (H Morreau in Leiden), a database collecting results of in vitro testing of variants to explore their pathogenicity (R Sijmons from Groningen), and, over-riding all these, a committee responsible for interpretation of all information based on geographic regional representation (Chair, M Genuardi, Florence). Interpretation is also informed by the larger generic databases of DNA mutations where available, and through external consultations.
Since the Yokohama meeting, InSiGHT has, with the Human Variome Project, attracted bioinformatics expertise to develop automated searching of text in the electronic literature (NICTA) to feed into these different databases internationally; re-vamped its website to include a phenotype database entry format (University of Melbourne Information Systems), engaged with MMIM to assist in design of IT architecture, collated all information on databases worldwide (S Forster), and commenced a feasibility study of the development of a portal that will allow an individual variant to be placed, and all information relating to that variant, including interpretation, to be displayed from the various disparate databases (CSIRO).
Frontier areas now are populating the phenotype database through a flow of data emanating from the Familial Cancer Clinic at the point of consultation, embedding of systems to collect data on searching strategies from DNA diagnostic labs, and work to drill down search systems to the individual variant level.
These developments are seen as a Pilot Project for the Human Variome Project, with mutual benefits to all. The US NIH Colon Family Register has on its agenda next month, an approach to allow a collaboration with the InSiGHT/Human Variome Project, which will further expand the reach of the InSiGHT/HVP Project.
Funding is required for sustenance of the effort, with imaginative philanthropic or Government funding for a project which has multiple facets between research and translation.

Lawrence Cavedon & Nicola Stokes: The role of NICTA/Text mining

Enabling more Efficient LSDB Curation with more Effective Automatic Search and Text Mining Tools
Nicola Stokes, Lawrence Cavedon and Justin Zobel

The collection and collation of relevant scientific papers is an essential first step in both the curation of locus-specific databases (LSDBs), and the analysis of genetic variants in diagnostic labs. Both of these tasks have been identified as severe clinical bottlenecks by members of the INSIGHT committee. For example, in the diagnostic lab, researchers investigating a mutation type that doesn't have an LSDB entry may spend up to a day searching the literature for information. One of the major reasons for poor search results is term mismatch between the user's query and a relevant document. Examples of term mismatch include - spelling variants "estrogen/oestrogen", and synonyms (MLH1 is equivalent to FCC2; COCA2; HNPCC; hMLH1; HNPCC2; MGC5172). These variants in terminology result in relevant papers being missed by search engines. However, significant improvements to search performance can be made when these related terms are added to a query, using a process called query expansion.
The most popular literature search tool used by biomedical researchers and clinicians is the NCBI's PubMed system (www.ncbi.nlm.nih.gov/sites/entrez). PubMed's query expansion is based around the MESH ontology (www.nlm.nih.gov/mesh), where query terms are mapped to MESH headings. However, MESH has limited coverage of genes and more general genetic terminology. The aim of our research is to address the needs of biomedical researchers and clinicians that are specifically interested in mutation related queries. The outcome of this research will be a retrieval tool which supports the information needs of INSIGHT mutation database curators, diagnosticians in the lab and feeds information to the automatic literature mining component of the project.
A second research objective addresses information extraction bottlenecks in the manual curation of LSDBs by use of automatic Text Mining methods. Text Mining is the process of extracting important facts automatically from text, in our case the identification of information pertaining to genetic mutations found in the literature. The first step in this process is the identification of relevant mutation related articles, which will be address by our search tool. We plan to develop Natural Language Processing (NLP) methods that can identify important entities in text such as gene names, mutation names, and the relationships between them. For example, every reference to a mutation (change in DNA or the resultant protein) needs to be explicitly linked with the gene it refers to. In order to automate this process we need a certain amount of annotation data. Annotation, in this context, means explicitly highlighting text that corresponds to interesting mutation-related entities in scientific articles. We are currently developing an annotation tool which will help us to collect marked-up documents from biomedical researchers, clinicians and database curators. Once this information has been collected (the success of which, of course, is dependent on significant contribution from the HVP community), our aim is provide a semi-automatic curation tool which will significantly speed up the collection of information for mutation databases.


NICTA (National ICT Australia) is a national research organisation funded by the Australian federal government, state governments, and other partners. Its mandate is to perform research in information and communications technology that has potential beneficial impact to Australian industry and society. The Victorian Research Lab in particular has a strong commitment to the Life Sciences sector, including applications of text mining and data mining to support the analysis of data of value to biomedical researchers and clinicians.

NICTA's text mining team is currently working with members of the Human Variome Project and InSiGHT communities to develop text mining techniques to support more effective document searches, and semi-automatic extraction of valuable biomedical data from documents. The success of this work is highly dependent on input from the HVP and InSiGHT communities; in particular, NICTA has developed web-based tools for capturing search-query patterns and for annotating documents with entities and relationships of interest to researchers, and feedback and their widespread use is crucial to collecting the data that will inform our work.

Tim Smith: VariVis to depict variation

''VariVis: A Visualisation toolkit for variation databases
PROBLEMS BEING ADDRESSED
In a survey of locus specific databases (LSDBs) in 2002, Claustres et al. noted that only 54% of examined databases would fit minimal criteria for ease of use, only "some" depicted the distribution of variation within a gene and "few" possessed graphical displays, especially of a dynamic nature (Claustres, et al., 2002). While specialised LSDB software such as UMD (Beroud, et al., 2005) and MUTBase (Riikonen and Vihinen, 1999) is available to provide these graphical displays, they each enforce a specific schema and user interface on the LSDB which may be undesirable, especially for established LSDBs looking to include graphical displays.
SYSTEM
We present here a visualisation toolkit, VariVis, designed specifically for LSDBs. VariVis is a collection of Perl scripts that works in parallel with the existing user interface and database schema of an LSDB to produce a graphical representation of sequence and variation data. VariVis can access variation data stored in a wide variety of formats including Database Management Systems (DBMSs) such as MySQL, Oracle and PostgreSQL, through to flat-file repositories such as comma or tab delimited text files. Gene sequences and annotations can be accessed from a locally stored file in any of a large number of sequence file formats, including the FASTA, BSML and GenBank, or VariVis can be directed to automatically retrieve sequences from any of several online sequence databases. One of the representations possible is a completely novel depiction of DNA variants showing all possible bases at all positions.
There is a lack of visualisation tools for variation data that can be implemented on any database system. The VariVis software package is an attempt to rectify this situation by providing database curators with a visualisation tool capable of easily combining the highly curated variation data within LSDBs with sequence and annotation data regardless of their underlying database and user interface.
COLLABORATIONS OR SHARING CAPACITY
VariVis is available as a free download from http://www.genomic.unimelb.edu.au/varivis/ for non-commercial research, teaching or education purposes.

Beroud C, Hamroun D, Collod-Beroud G, Boileau C, Soussi T, Claustres M. 2005. UMD (Universal Mutation Database): 2005 Update. Human Mutation 26(3):184-191.
Claustres M, Horaitis O, Vanevski M, Cotton RGH. 2002. Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. Genome Research 12:680-688.
Riikonen P, Vihinen M. 1999. MUTbase: maintenance and analysis of distributed mutation databases. Bioinformatics 15(10):852-859.

Bernard Brais: Collection of inherited neurological disease from a Quebec isolate

Bernard Brais M.D., M.Phil., Ph.D.
Neurogeneticist
Associate Professor
Department of Medicine
Faculty of Medicine, Universite de Montreal
Centre de recherche du CHUM, Hopital Notre-Dame-CHUM
M-4211-L3
1560, rue Sherbrooke Est, Montreal, Quebec, H2L 4M1 Canada

The RMGA/FRSQ Quebec Infrastructure for Locus-Specific Mutation Databases: Knowledgebases to initiate a Genetic Atlas of the Population of Quebec linked to the Human Variome Project.
Bernard Brais, Jacque Mao, Charles R Scriver

Databases are a legacy of science; not glamorous, often neglected, but necessary. Locus-specific mutation databases (LSDBs) conserve expertly curated information using a controlled vocabulary and standardized mutation nomenclature. LSDBs can also be knowledgebases for research and clinical interests serving scientists, physicians and patients. Quebec LSDBs are linked as nodes to the WayStation of the Human Variome Project (HVP). The Quebec classic LSDB is PAHdb created and still curated by C.R. Scriver. PAHdb has served as a useful model for nodes in the definition of the HVP objectives. It has also been used by other integrated projects such as FindBase or PhenCode. The challenge for the Quebec scientific community is to create LSDBs which are of particular interest to our population considering that it harbors variant alleles at many known and yet to be discovered loci associated with genetic diseases. The RMGA of the Fonds de la Recherche en Sante du Quebec (FRSQ) infrastructure supports the maintenance of "inch wide, mile deep" databases and encourages the development of new ones with a special emphasis in increasing their population genetics content.
We are presently awaiting a renewal of the financial support from the RMGA to further develop this initiative. We intend to develop during the next 3 years (2008-2011) an infrastructure supporting 30 LSDBs for disease genes where cases cluster in the Quebec population. We are developing and integrative framework to trace the introduction and diffusion of the mutations in the population and tools to represent the relative regional carrier rates of the different mutations. This integrated set of LSDSs will serve as the foundation for a projected web base Genetic Atlas of the Population of Quebec. To achieve these objectives we offer technical support to expert curators by providing access to online resources and guidelines to develop and design LSDBs and ensure links between the RMGA infrastructure and the international HVP. By prioritizing genes for which the first mutations were uncovered in the Quebec population, we hope to encourage researchers to invest in the launching of LSDBs at the time of their greatest interest. The development of LSDBs with special interest in populations with founder effects or distinct ethnicities may be one of the positive outcomes of the HVP by improving our understanding of population genetics by having access to rich data sets at different loci for individuals from the same population.

David Goldgar: Issues in Integration of multiple sources of evidence in clinical classification of sequence of variants of uncertain significance

David Goldgar
University of Utah School of Medicine

In trying to assess the clinical significance of a given sequence variant (VUS), there are a variety of different approaches that can be applied, ranging from co-segregation of the VUS with disease in pedigrees to in-vitro and in-vivo functional analyses. It is somewhat convenient to divide these different kinds of evidence into what we call direct evidence, that is they have a direct relationship with disease risk, and indirect evidence, where the relationship with disease is more distant. Three issues are paramount when thinking about combining these differing kinds of evidence into a single model: 1) relevance to disease phenotype; 2) quantification of qualitative measures; 3) statistical independence (or lack thereof) of different model components. A related issue involves defining a suitable prior probability that a given variant is deleterious and on what this prior probability is based. Typically, this prior probability will be a function of locus heterogeneity in the relevant gene, the completeness of screening, and potentially the characteristics of the sequence variant (e.g., conservation, domain, etc.). Once the individual components of the model have been defined and validated there are a variety of was of building an integrated model, including Bayesian mixture models, neural networks, adaptive learning, etc. Lastly, it is important to think about how and to whom this information will be disseminated.

Vijaya Sundararajan: NCRIS and its relevance to collecting variation and phenotypic data in the future

The National Collaborative Research Infrastructure Strategy (NCRIS) Roadmap identified 'Population health and clinical data linkage' as a priority capability in 2006, with the Australian Department of Education, Science and Training (DEST) provisionally allocating funding in support of an appropriate investment. Professor Michael Frommer has facilitated the process and developed a proposal. At a meeting in February 2008 of interested parties, agreement was reached on this proposal. Currently NCRIS has yet to release their final decision on the allocation of funding.