Notes from DAS/2 code sprint #2, day four, 16 Mar 2006 $Id: das2-teleconf-2006-03-16.txt,v 1.1 2006/03/16 20:45:48 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke (at Affy) Sanger: Andreas Prlic UC Berkeley: Nomi Harris (at Affy) UCLA: Allen Day, Brian O'Connor (at Affy) Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Status reports --------------- nh: apollo work, reading the registry, saving capabilties. modifications to code that was based on prototype das adaptor. Generally lots of under the hood work to bring it up to spec. bo: diff functionality between allen's server biopackages.net server and andrew's samepl xml. Updated templates in allen's das server to match andrew's sample xml. ad: worked on validation server, all stuff is in cvs. the http://cgi.openbio.org:8080 server is built off cvs, just check out and rebuild. gh: worked on affy das2 server and client up to current spec based on whatever the rnc documents say (schema doc) as for xml. no chance to read andrew's email on query syntax, will incorporate that today. sc: got latest version of gregg's das/2 server up at affy. serving hg17, hg16, dm2. Updated code that the das1 server is using based on latest genoviz jars. Getting some errors when loading data for new affy arrays. Investigating. aday: minor bug fixes for spec v200. exporting assay data as different views. ucsc browser can viz expression data out of das server in bed format. das viewer can view as egr format. working on single chip at a time. ls: here's a great use case for you: there's a cshl fellow creating dna spectrographs of oligo frequencies presented as audiographs. can really tell diffs from coding vs non-coding, CpG triplets, microsatellites harmonics, big matrices of floating point data tied to genome. consider this a challenge to das to serve this up. my postdoc sheldon mckay is serving this up give you heatmap back given a genomic region. new glyph for spectrographic data aday: format netCDF is good for this, but clients out there don't vizualize it. gh: would like to support netCDF in igb. not sure if this is default way to represent qualtitative data for das. [A] allen will send lincoln pointer to netCDF. aday: netCDF is great for cross-lang, cross platform support. gh: people are pushing wiggle format to ucsc, so we don't want to restrict to just netCDF. aday: my refactor yesterday allows treatment of these as templates. gh: how do this via region query in das? ls: feature query, tag says here comes binary data, each column corresponds to a base (or maybe a scaling factor to indicate # of bp per column). tag says here comes binary qualtitatilve data, scale is 1:1. gh: better way is to use alternative content format stuff (already in spec for types) ls: if you do feat request and don't filter by type, you'll get a mix of binary and non binary. aday: not in genome domain, genome/sequence the fetch to assay service to get quant data. then do intersection to find overlap. performance goes out window if you make the query too complex. fine to do just two fetches. ls: how indicate scale for numerical scale? aday: good question. units are not encoded now. ls: spectogrphic data one value per window where window is 100 bp aday: so two diff units window size, amplitude value and frequency, and that's in four channels for the bases. we're representing as 4 matrices. aday: one matrix per channel.many formats don't support n-dimensional data. only 2d at most. ls: in das1 did base64 encoded string in the notes. It worked. gh: we can't require all clients to know how to interpret it. This is why we have the alt content functionality... [A] das should support dense numeric data across regions, format specified by the existing alternative format mechanism Topic: Spec Freeze ------------------- ls: can we talk about feezing spec? ad: what good will it do? ls: allow us to code to a fixed spec. you freeze spec, people write code for a defined period of time, during that time we compare notes, then make changes, freeze, and repeat. ad: concerned there hasn't been enough work since the changes in jan/feb. ls: now that i'm 'on the other side of the fence' of spec writing, i'd like to see it not change, and have time to make an informed view of what it's strengths and weaknesses are. ad: haven't gotten feedback about my questions, until the codesprints. two months ago, only now being addressed. ls: these issues don't become pressing until we start implementing. this is why we do code sprints. ad: worry because there's been no extensive data modeling for features. ls: can do a 1 month freeze gh: comfortable with 1 mon freeze of schemas as they are in the rnc's now. issues will come up. ls: announce on biodas.org - march 18th das/2 is frozen for 1 month. gh: we'll have to live to ambiguity with how server does certain things. ls: hence the time limited 'trial' freeze. ad: would have like people to write code from last feb so I could get feedback. ls: you very much improved the spec. grateful for what you've done. I wasn't getting feedback when I was writing either. gh: validation website is great for implementers, rather than having to read a spec document everyday. ad: schemas aren't going to change after today (pm). would like to clear some things up about filter language, today? ls: most urgent freeze [A] spec will freeze as of end of today (3/16/06, PST) for one month. Topic: Feature filters ---------------------- ad: feature filters is most important, and how do we define global names? schema is a simple change - which is req'd and which is optional but for impls makes a big diff. ls: global is req'd and local is optional. ad: who comes up with global names ls: first person to do it has naming rights. people have been able to do it for the ensembl service. ad: I need documented names gh: it means you don't know whether two names are the same thing until this document comes out. ls: filter language? ad: gregg needs inside and contains, - type and exact type: das type or ontology type? ls: das type gh: uri attribute of the type ad: that type or it's subtype makes no sense for das types ls: it's just an exact match. client can use ontology to get a series of types ls: should be an exact match, does not traverse ontology. client should ask user: do you want all exons or a specific type of exon? ls: client goes through ontology as necesary [A] drop exacttype, type now has exacttype semantics Topic: XID, feature ids ------------------------ ad: xid in features. no one used yet. gives a ref to some other db. all it is is a url/uri. feels like there should be more info (type?) ad: primary name field for feature, feels like should be name ls: name is human readable. title would be ok ad: but feature filter is called name searches name and id fields ls: this is correct behavior, you can do a fetch on the url/uri this is ok. ad: the name feature searches title and alias. gh: if feature id is resolvable and you resolve it, there's no guarantee it gives back a das2xml document. if the feature uri is resolvable, and you fetch it, you will get back a das2xml document right? can you put uri in the feature query? aday: feels that having auto-generated names ad: do all features have a human readable name? gh/ls: optional ad: why would you want to put a url in a name field? gh: rdf ad: should be a resolvable resource, das2xml for that feature. ad: features with aliases, do aliases need type pk or accession? prosite has false match to ... ls: this is a property or xid, not alias ad: suggests that xid needs extra stuff to it. gh: file with an optional type attribute on xid ad: let's wait to someone has a need. Topic: Feature filters (continued) ---------------------------------- gh: feature filters, inside, contains, identical. Which do we need, which can we drop? [A] overlaps - keep (all agree) inside - gregg needs contains - dropping, maybe identical - dropping ad: what about excludes - the complement of overlap? gh: haven't had time to investigate whether I can use excludes rather than the inside + overlaps (contains?) combination I need now. ls: use case: pointing to children and they haven't arrived yet. gh: my client keeps stuff around, when you get parent/child if you have parent + all children you can construct feature. ls: the spec requires single parent, right? gh: no you can have multiple. ls: gff3 spec also allows mult parent and children [A] Lincoln will provide use cases/examples of these features scenarios: - three or greater hierarchy features - multiple parents - alignments Topic: Registry ---------------- ap: still here. gh: looking at registry, having trouble retrieving in a normal browser. when looking at it in client, I only see biopackages server registered as a server. Lincoln said there was more? ap: this is related to mime types, changed from text plain to x-das-sources gh: I get an error: source file could not be red. lincoln said you added other test das2 servers to it. ap: working on interface so users can upload servers. half way through it now. upload a link to sources. will send email once it's there. [A] Steve will add gregg's new affy das/2 server to registry when Andreas' web interface is ready gh: same time tomorrow.