Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006 $Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down, Roy Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. [note taker missed the first 5-10 minutes] Topic: encoded URLs ------------------- ls: apache bug - unesacped //. must be percent encoded or apache can run into problems gh: most people don't bother escaping, we should make this clear in the spec. every major library has ways of doing this automatically. [A] update spec to state: contained urls w/in das query urls should be encoded Topic: Style sheets ------------------- ad: see Jan 26/27 email, "style sheet question" what i described is not the same as what das/1 style sheets supply. we already have a mechanism gh: embed ss in types element? ad: or, new capapbility or link server for a given source. gh: prefer this td: easy to have a single style element gh: would a types elem have ptr to ss or do you query for the capability? ad: if no one's interested we don't have to answer the question. sounds like no one's interested in style sheets. gh: we'll keep what you have in the spec for style sheets and move on. ls: what is it? ad: yes. style is embedded in type record. it's now on a per-element basis. ls: ok with this. attributes of types. is there a need for a separate ss? true it mixes presentation with data model. people will look for the info they need and can ignore. ls: transition to separate sheets - visual style id pointing to ss url. same as with html. instead of 'i' tag moved to font style info. Topic: Writeback ---------------- gh: discussion in progress in uk. how big a change from current writeback spec? ad: spec: server does modification to data. this proposal: client can now do more stuff with the data. gh: writeback for client is considerably harder, rarer to impl. ad: issues: can you still do searches for modified data on server? ls: building objs from bottom up (children, to parent) so everything has a url. ad: each feat has parent and a part. ls: true. temporary id mechanism, response indicates mapping to local id is. what happens is: client locks, uploads parents, children with temp ids, does referential integrity checking, then reports mapping from temp to local id. gh: doing http DELETE imposes a constraint ls: how handling id issue? gh: you need something to create new, real id ad: b/c they're in one transaction, server can ls: delete is a problem because http delete only permits one at a time. updates a problem too. post that creates new objs allows you to create multiple new objs at same time, but push and delete only operate one at time. ad: at this point don't want to change data model. ls: so everything will be a post then, under your proposal, for writeback url. ad: a single post. gh: moving from http delete to a trying to understand how this is a delta model. ad: only updates things that changed, and listed deletions ls: fine. writeback, create update and delete sections td: granularity. not single characters. one feature. ls: three transactions we previously had, put, post, and delete, and roll up into a single transaction. gh: when you send back a feat you ve already seen, do you restate all the xml for that feature, since otherwise it is deleted? ad: yes. gh: would like the unit of ro ls: this achieves per transaction integrity, since you don't have to do multiple deletes. the lock idea, had to persist over multiple transactions to allow for that atomicity. gh: we need to keep lock so curators can guarantee that nothing changes underneath them. td: lock corresponds to a db transaction as well. ls: no one's impl this writeback so there's no friction against changing it. i'm fine with it. as long as people don't mind we're losing a cute feature described in a grant. gh: what does roy or ed g. think? roy: have been involved in this. this mirrors some features that otter does. a good idea. deletes and put aren't big winners, if updating multiple feats and they refer to each other. roy: whole xml doc is the transcaction ls: if anything doesn't make sense, all requests in the writeback doc are rolled back. roy: yes. some error messages to understand what might be going wrong. gh: splits and merges work too? merging one feature from two, or splitting one transcript into two. roy: fits in well. get back two ids of new features. otter give a lot back in the xml after posting the data. gh: treats id in feat is a placeholder and it sends a real id back to you. ls: your given a temporary placeholder then it give you real id. might want to put a formal merge and split commands. because in proposed new system (and old) to split one exon to two, you have to either delete the original one, or update it to change one boundary and create a new one. you've lost the ability to keep track of the original and the two new ones. ad: feats have place for arbitrary annotations. creational history log could be maintained. ls: how upload this to a server. splitting exon into two daughters is different from deleting and creating two new ones. ad: no needs this, for future. gh: it's needed now. ls: splitting genes into two pieces is important. people want to keep track of this. formal merges and splits permits this tracking. gh: my take, prefer fewer verbs as possible. if we can formally define splits and merges as combos of delets and creates, perfer this. ls: semantically difficult for server to know that a delete followed by two creates is different than a split. td: ancestor id on the features can solve this. ad: haven't heard about this use case. features have place where you can stick in new data. database can read it to understand history. gh: like idea of curational track of ancestors. before, people said we can't require dbs to do this. td: optional property ls: could thread it through feature properties. ad: this version, or for 2.1? gh: initial write back must support splits and merges. [broad agreement] ls: make sure it will work. what happens when track of ancestors and the ancestor object disappears. gh: can't assume a db has identifier for every curation in it's past state. roy: weakness of the current otter schema, james is working on a fix. tag a release and go back to genes as of that release. ls: acedb had this feature to rollback to older versions of gene model. aday: the schem we're using has support to previous version. roy: tedious. big script, but a good thing to have. ls: a few hours of more discussion to see what's involved in supporting tracking curational merges, splits, renames, etc. to make sure it's the write decision to put it into a curational property of feature rather than having a formal database merges and split operations. i'm ok doing it this way if it seems ok. gh, aday: me too Topic: NIH grant proposal ------------------------- gh: i'm the bottle neck Status reports: --------------- gh: igb das client still. checked in code. you can get das2 client in igb poiting to codesprint das2 server. sources, segments, types. no features yet. working on this today. should go faster today. ad: sent email to allen about some things about server that don't agree with spec. properties aday: features have no properties associated with them. do we need valtype or href. nh: a key with no value doesn't make sense. using 'true' if no value. aday: ok. but need an agreement on what to do for properties with no associated value or type ad: can make it so. aday: now put in empty string ad: use for both value and href aday: can't have both. ad: what's interpretation if you have both? can take out href part and have value= empty string nh: client deals with empty value. ad: leave it as a string suzi: uneasy about this. td: it does have a value, empty string. suzi: some places where empty string doesn't make sense. data gets dirty. if you're gonna have a tag-value structure, and may or may not be a value, it's bad. some things are tag-value, some things just have a value. it seems ambiguous, no guaranteed behavior. ad: guaratee is for all keys to have a value. can be empty string. gh: string or empty string is ok ad: only used for clients who know what it means. may have to update apollo gh: if we allow arbitrary xml in features, client will have to remember this xml or it will disappear. ls: a huge issue w/ apollo in past. when communicating w/ db's that have extra stuff, in the xml that isn't on client side data model. suzi: my take, the client should not have to pass it all through. nh: it forces client to be a complete database gh: then the delta writeback ls: works ok for deletes, updates become an issue ad: you have to deal with text you don't understand. ls: you have to keep track of tags you don't understand, other wise they are deleted. gh: trade off, simplicity of writeback, and what client has to remember. ls: client says: i don't understand it, but i can't delete it. gh: how hard is it to have an abritrary xml chunk by client? ls: give it an empty tag to say you want it to go away. nh: how do you delete things that came in empty and you want to delete them? ls: can have attribute="delete me". this creates a burden on server side. [client folks like this..] decided to keep everything you know know and send it back. round trip it. ad: client can throw away what it wants. can go back to server ls: boomerang. gh: a variety of ways to make sure the data gets stored. roy: will be in feature. just hold a pointer to it. suxi: hard for apollow. passive round tripping is fine.. difficulty is with deletes. ignoring stuff, don't know what it is. delete a transcript or whole gene. some of that stuff you don't know what it is, describes a mutant phenotype. you deleted from genomic record, but there's other data that shouldn't be deleted. client would have to be fully cognizant of it, beyond genome sequence features. client now needs to model all the other data too. ls: difficult to understand how a client could deal with it. ad: just xml is a opaque chunk. why can't client send back full record? suzi: won't solve the full problem. if annotator said delete it gh: client says delete that feature. it won't pass back any stuff underneath the feature. some stuff underneath it that shouldn't be deleted. ad: that's what you have back ups for. suzi: beyond this. to deal with this, we made deletes be more atomic. had to be handled at server side, otherwise, we have to put all that knowledge into client. gets tied to a particular group. ad: knowledge of what? suzi: additional information if you delete whole thing at top, any pass through data is also gone. gh: not hard on client, just what does the server do with that? suzi: this is why it belongs on server side. knows what matters and what doesn't matter. if you don't want clients tied to a particular db. that solution will be inadequate. we had to put the info on the client and make the operations as fine grained as we could. ap: writeback issues have been discussed. suggest to take this up tomorrow. ad: could someone write up why a client couldn't just track the tings that it wanted? then we can consider. Status reports, cont'd ---------------------- roy: zmap client. can get sources and types from server. parsing it creating internal objects. can't draw features yet. long discussion about write back today. ad: validator stuff td: talking about writeback. ap: working on registry. first das/2 server. distinguish between das/1 and das/2 via accession points. brian: rpm build for allen's server. will post today at biopackages.net suzi: spoke to chris about web services for ontology. he will talk with allen. thing about ids to deal with. also, if we do a web service that isn't das like, it should be doable. should be able to get the terms. also, if we want to have stop codon replacement, you also have to say what position, what it's replaced with (uridine). how is this done in das spec? gh: can you post to the list? suzi: yes. aday: will raise writeback issues as well. suzi: small point mutations, indel, substitution (base and position) aday: nearly got apache config file done, impl new std error documents, 300, with error document. nh: more apollo client progress. haven't dealt with types yet. ee: igb improvements. sc: pipeline for populating affy das server with array data. completed pipeline for exon array design data.