Notes from DAS/2 code sprint #3, day one, 14 Aug 2006 $Id: das2-teleconf-2006-08-14.txt,v 1.3 2006/11/06 19:12:52 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke Panther Informatics: Brian Gilman UAB: Ann Loraine UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Status reports, including what you want/need to focus on for this sprint, progress from last sprint. Status Reports --------------- gh: have done writeback work. IGB can create curation, post to biopackages writeback server, das/2 client can see curations. no editing yet. client can edit own data models, can't post those edits. to work on ID mapping stuff: client can't accept newly create ids from server. currently just holds onto temporary id's. IGB client has had one or more release since last. priorities - mainly writeback for client. ls: continue working on perl client interface to das/2, not functional at present. need to backout changes since last sprint. das/2 tracks in gbrowse. About 10hrs needed. sc: have been working on keeping data on Affymetrix public das servers up to date, dealing with memory issues cause by increasing amount of array data to support. Gregg has new efficient format for modeling exon array features with lower memory requirements. Will work on getting the das server to use it. Long-term plan is to remove our das/1 server and just have das/2, easier to use and maintain. Complete transition will take time though. Have continued working to automate the pipeline for updating the affy das servers. Have a new page that lists available data on the servers, currently manually created but plan to automate. ad: web dev in python, taught course on that. plan: getting python server up, to experiment with writeback. updating spec as per a couple of months ago. gh: andrew will make spec a top priority, grant is funding for that. bg: tasked to take das/2 data and produce set of objects to use within caCORE system at NCI. Have objects for das/2 data and service. can retrieve das/2 data from affy server. present in simple web page. Using java and ruby. gh: good week to ask questions as you flesh out the impl. ee: gregg and I will put out new IGB release this week. can work on style sheets (left over from last time). Or can build a gff3 parser into IGB (lots of excitement!). al: two things: demo applications for self and collaborators and das newbies. retrieve genomic locations for targets of affy probe sets and then retrieve promoter regions upstream. gh: promoter data in das2 server? al: can just say 500bp upstream of gene. not identifying control. Just retrieve seq to pipe into control analysis. Second one: meta analysis, results from diff groups for associated phenotypes. Input: list of markers, output: annotations associated with these. Statistical analysis. Ultimately obtain candidate genes associated with markers. Some preliminary work on obesity that looks promising. [A] Steve will help Ann convert fly probe set ids into genome locations. Goal is to write something that can do random sampling of gene annotations. ideal world: das server gets region, returns gene ids and go ids. Less ideal: just get genes within the peaks (from association studies). bo: doing rpm packaging for the mac (tgen). so people can set up das2 server on a mac. update rpm packages with results of work this week. clean up bug queue on biopackages server impl, bringing it up to spec. can talk about analysis part of server. internal hirax client for retrieval of assay data. communication with server is out of sync. Spec issues: ------------ gh: want to focus on writeback. wants full xml features rather than mapping document. aday: work on writes as well as deletes. Impl 413 entity request too large adding this for requests that exceed some size threshold (10kb, 100kb) if at or below, OK. gh: need to coord with me on writeback, I focus on client writeback, you on server. Editing is ok. Deletes are harder. Other Issues: ------------- gh: Contact peter good about funding. Extending from 2yr to 3yr. talk with lincoln and suzi about plans for next grant. sc: status of bugzilla open bugs on spec? [A] Someone should go through and update bugzilla list for spec bg: version field. gh: not too understandable. at last sprint, two freezes, the version tells which v of spec freeze the server is using. assumption is that now the servers are using the most recent spec. If they're not compliant, please let us know. affy server: won't give back a list of all features. requires an overlaps and types restrictor. biopackages: should be good with latest spec. bg: sources document, source tag has version. if you do a query like types, also has version? No. ad: sources document: worm 161 (data source). capabilities describe things like writeback support for v161, but not v160. bg: that version seems to have different sematics given query. biggest issue was parsing and populating my object model. gh: coordinate subelement in version elem. has a version attr. my client does not deal with coord stuff. meant to make sure that annots from two servers are refering to same coords, so you can overlay annots from different servers. my client is using version URIs for that instead. bg: other issue: in order to know what server you're hitting, you have to know name space of doc, which has base URI. XML base in segments query. xmlns biodas.org/das2. to have tracability in documents you receive, you as implementer must track urls, converting relative to absolute. can be a problem when hitting 5 different servers. gh: my obj model (client) has model of server with root url of the das server, sources objects which has xml base of each source. bg: you could get back a 404 from xml:base. Perfectly apropriate. server could put whatever it wants in xml:base. currently it's the document. ad: we're using the xml:base spec, so you can put xml:base on any node you want to. construct full url by. gh: in our schema is it clear which attribs are resolved by xml:base? ad: no. bg: would like to see one big document with every element, not several different files. relaxNG isn't best format. would like a w3c XSD that defines the elements. from coders standpoint, don't have to go and look at 5 different docs. Have to have multiple windows up, figure out how they are connected to each other. semantics within each query, who is calling what. ad: I gave brian one. using trang to spit it out. bg: trang is not best xml schema writer. I could work on this. why do you use relaxNG? ad: I can read it and understand it. there were good examples. bg: I can autgenerate code that is in XSD, soap and other wservices stuff does that for you. Can generate a parser, point it a uri, get doc, generate a parser and object model. ad: parser would break if server returns extra attributes. In spec there are some extension points. can put any element that is in a separate namespace. I know how to do that in relaxNG, but not in XSD. bg: you just have to add another xmlns. define an extension point with that namespace. ad: should be able to resolve it into one. bg: Three items. 1. will ask w3c people about XSD to relaxNG. 2. semantics confusion. 3. xml:base appropriate to supply a 404 if client was dependent on that attribute. ad: version tag is problem if there are duplicates. should be changed so there are no duplicates. can build parser on rng bg: it's experimental, alpha s'ware. don't want to use for production. bg: when you put a relative url inside a xml:base. ad: resolvable via http, or in abolute url. gh: if you resolve it up to the top level doc, then use the url of the document itself. whether clients actual do this, depends on impl. say to implementers, we could state that the top level document should resolve to absolute url. we wanted to say, "Das/2 uses xml:base spec. period." bg: put this in the spec, how you want it to be used. ad: don't like saying, "we use xml:base with these additional things" bg: can put off for now. ls: In my library when I see a url and can't resolve, I fall back to a hard coded url.