Notes from the weekly DAS/2 teleconference, 13 Nov 2006 $Id: das2-teleconf-2006-11-13.txt,v 1.1 2006/12/08 03:02:58 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke Sanger: Andreas Prlic UAB: Ann Loraine UCLA: Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: ------- Specification Status of schema (das2_schemas.rnc) Ratification of schema freeze Status of XML Schema translation (das2_schemas.xsd) Formalizing query syntax? Status of genome retrieval specification doc (das2_get.html) Review of remaining issues in genome retrieval spec. Coordinates URIs Segment reference URIs Ontology URIs Revising example queries / responses Timeline for DAS/2 genome retrieval spec freeze. Other docs? Implementation status Validator Genome retrieval servers NetAffx queries responses biopackages queries responses DAS/1 --> DAS/2 conversion server cgi.biodas.org test server Sanger registry others? Example queries Biopackages ontology server Genome retrieval clients IGB queries responses others? Topic: Specification ----------------------- gh: regarding the xid/link changes - no servers were using that, therefore not a major issue. ee: can't use ucla server gh: compliance issues with both servers. sc: was working on re-organizing the html das2 document but got stuck in CVS commit hell (lots of commit activity going on...) gh: we'll focus on that in a few mins. gh: edits to das2 schema except andrew and I. ls: looked over things. questions about html doc, but schema looks good. ee: draft3 dir? ad: draft3 dir are old, should be removed. gh: those are the ones that got combined before creating the das_schemas.rnc ee: did good bit of work on style sheet. not ready to freeze. gh: not concerned about freezing stylesheet gh: any objections? sc: can you reiterate the xid/link stuff? ad: xid element had lots of "should haves". no feedback on this yet. referring to other datatbase, 'false positive'. decided better to not have this, pulled it out. recommend html attribs for link element. gh: human readable tag is important to igb. ad: rnc has examples for reasons to use. features result 'link to rss feed' so you can get new results for that feature. gives freedom to add new kind of links to it. ad: allen here? no bo: no feedback re: freezing the spec. final doc is das2_schemas.rnc? ad: yes. [A] clean out old, obsolete docs in that dir [A] add a link near top of html doc to the schema doc. gh: schemas document is now frozen! opinions on how long it should stay frozen? ad: depends on feedback we get. bo: don't change it at all. ad: errata, 2.1, community gh: can we agree that no changes to it unless discussed on the conf call. all: yes. gh: would like to discuss XML schema translation of the rnc and query syntax when Brian Gilman joins in. Topic: Review status of genome retrieval spec (das2_get.html) -------------------------------------------------------------- gh: looking at CVS commit log from prev week. most of this was to reflect changes in the rnc. gh: 1.35 - done by capabilities now 1.36 - remove reqt to return seq in fasta fmt. want to be able to specify a segments doc but not have to return the residues. also did polishing error responses, server decides when response is too large and sends error messages. 1.38 - started putting in ontology URIs. we had discussions with chris Mungall discussing how to refer to ontology entries via URIs. he said it would happen via NCBO but not until next year. Updated to refer to ontology server that allen and brian (UCLA) are working on. [A] brian/allen (ucla) will work with ncbo on uri access to ontology terms when they're ready related issue: segment reference URIs. we still don't have ref uris for anything but worm and fly. lincoln created at last code sprint. ls: did human and mouse, too. on the wiki. ad: global seq ids wiki doc: http://www.open-bio.org/wiki/DAS:GlobalSeqIDs gh: I was looking at doc checked into CVS. [A] will change examples in spec to start working with these gh: how this relates to registry, uris maintained by andreas. no connection to andreas' registry. ad: ziltch ap: this is concerning the uri for coordinates. gh: this has to connect with a uri that gives these lists of sequences. right now, no way for someone to look at coord uri, or source/version/authority and see which of items in this list of global seq ids to use. they can guess, but there's no formal way to do that now. ap: ok gh: have pointer for each of these sets of segments a pointer to the coord uri at sanger. ap: uri of coord should be resolvable to additional info like organism, version of assembly, etc. ls: diff between uri and gsid gh: gsid not an id for the whole assembly. wait... it is., but is diff from the ones andreas is using. ls: so his registry needs to be updated to use all builds/releases listed on this page. ap: ok. so names can be resolved? ls: all are uri's, some can be resolved, but that's accidental. ap: fine. gh: coordinates element, like everything else, are allowed to have a doc_href, right? so you can have a pointer to a doc that does describe it. ad: nope. uri, taxid, source, auth, version, created, test range gh: some readable page describing coordinate system ad: can either use an extension, or a link ap: link is fine. ad: segments are resolvalble, but reference ones are not. ap: makes sense for the reference coordinate uris e resolvable, too. gh: don't think they need to be resolvable. but it's nice to point to the website of the authority that is owner of that assembly. getting them to put up a resolvable is problematic. ls: not nec a problem, but that it will never break is a problem. I could provide doc_href for each ncbi build, that should be fine. why must the uri resolve to anything? eg, documents to describe build statistics. ad: people only need a unique string. ls: how about a doc_href for each one, and put that in the coord system. ad: in coordinate tag where you supply uri for assembly, there is no space for doc_href. ls: withdraw gh: registry and server must agree on the names used for coordinates. that's all I need. means I need to change my server, ucla, andreas must change registry. ad: changes to that wiki page, adding new assemblies ls: can be done. this gsid page was just a starter. ap: could parse html page to get a list of uris. bo: what needs to be added to biopackages. gh: to have registry know that your ref seq is same as everyone elses, need uri for a given assembly. bo: in v source or v document. there is a coord element that has uri pointing to the assembly uri. gh: segments response, each seg type has a ref attribute to the appropriate uri using these gsids. bo: this is already in there. gh: you're good to go, but the affy server needs updating. [A] gregg/steve update affy server to use the cannonical list of global seq identifiers. Topic: other changes --------------------- gh: cigar strings, added ref to document to quote. need to put in examples of it (alignments). those are the major things that changed. todo: coord, segment, ref ontology uris. revising examples in the spec. architectural re-org of the doc. ee: some places where the lang could be clarified. not changing meaning. ad: ok. sc: doc re-org stuff. described. [A] steve will post message to list when re-org is done gh: "more examples" needed sections. I'll focus on these. target freezing html doc by end of week. aloraine: updating website with all working servers? gh: better to point to registry to say go there. people will then know to contact andreas to get their server there. al: interested in plant das servers. ap: I don't know about these. wrote an email to them to put there servers there. al: does EBI have any? there was a das site associated with ensembl (for plants). the Iowa state das server needs fixing (xml is malformed). [A] Ann will send info about Iowa state das server to Andreas Topic: Implementation status ---------------------------- gh: validator has been helping, is it on lastest rnc? ad: not yet, but it should be easy, just a cvs update. [A] andrew will update to lastest spec gh: impl das2 servers: need changes to affy server to bring into more compliance. all responses passes validator now, but is breaking what it needs re: errors. will coordinate with steve when ready to deploy today or tomorrow. sc: error codes. gh: certain things it can't respond to, but if I throw the right error, it's considered valid. ucla: [A] run responses from biopackages server through andrew's validator when it's updated gh: segment syntax is a full uri bo: biopackages. full uri should be usable, for feat and type filter. when I updated server to fix v/source, I turned off caching. I can't clear out cache completely. there may be old response documents that need to be cleared out. Will leave it off for time being and figure out how to clear out the cache. ad: I have updated schema on the validator. bo: another issue when reading thru rnc/html under ver source. gh: capability type element has to match. would cause igb to fail. bo: will update this. gh: getting servers to pass validator is more important than freezing html spec now. time pressure for Brian doing caBio development. has not servers to hit against, so they need to be back in action ASAP. gh: andrew's das1/2 proxy? ad: das1 to das2 proxy. does it on demand. not publically accessible now. feature conversion was too slow. need to re-write to no longer using feature template. gh: would want to consider putting it on a fast machine. would be a nice thing to have to support all old das1 servers. [A] make das1-2 proxy public ee: not many client use of das2 now, so load should be bad. andreas status of sanger registry. ap: not much work for das2 since code sprint. now that spec is frozen, planning to use andrew's validator when rewriting server. also interested on the das1->2 proxy. gh: could have registry make use of the proxy gh: checked igb using sanger registry, was using recently. not sure if it's passing the validator. gh: lincoln plan for serving hapmap data? ls: import the essential part of data into a hapmap server that brian gilman is writing. then exported data will be brought into a caCore client for re-exportation into the caBig grid. gh: spec freeze helps timeline? ls: brian gilman says he will have das2 client out by later today. he should have joined this teleconf. gh: I worked on xsd schema and talked to him via phone. it is now up to date with frozen rnc spec. can use it to generate java or other programmatic bindings. gh: status of biopackages ontology server, but it is up and running. it serves uri's so it is sufficient for das/2 needs now. genome retrieval clients: [A] gregg needs to see why igb is having problems with biopackages server. am updating local server for local igb testing, will coord with steve to post on public server. ee: don't break it, am doing a presentation to cytoscape folks gh: will get a new server going and keep the old one going gh: will coord with steve Wrapup ------- gh: lots of good progress this week. igb release planned for next mon, when ed will be back. [A] meet next monday to freeze html doc.