Notes from the weekly DAS/2 teleconference, 5 Jun 2006 $Id: das2-teleconf-2006-06-05.txt,v 1.2 2006/06/06 00:52:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: status reports --------------------- gh: waiting to hear back from peter good re: grant. He thinks we have a decent chance of additional funding (bridge funding), would fund till new grant kicked in in June 2007 with suzi as a PI (revised grant). Total funding would still be less than the amount originally requested for this grant. Definitely will have funding through september this year. our grant folks beefed up the dalke consulting and cshl accounts. Will let people know re: funding past september when I find out. Impl wise, not much done in last 2 weeks. about to start testing writeback from the client side. write new features back to das/2 server (the easiest thing to test). New realease of IGB is out now with a testing curation feature. Go into preferences to turn it on. (Ed worked on this) ls: sent in example of das2 features request that returns alignments. discovered that i needed to add a new attribute to the LOC tag. have to indicate that alignments use the cigar gap string. whether you gap the ref or target sequence and indicate which one's which. there's a target attr in LOC that indicates which one is the target (a little assymetrical). gh: you can get both target and query? ls: yes. the cigar string usind d and i, you have to indicate which one is which. another thing: das/2 project for caBIG is pulling das2 into the core, has a kickoff meeting this wednesday. I will be on that meeting. we'll reiterate goals, timeline with adopters (Wistar institute) gh: it's been a while since we talked about that. is the intent to have das2 servers that can sit on top of caBIG? ls: no, das2 clients via cdBIG. we won't need it for a couple of months, hoping we'll be able to use the biopackages das2 server to serve out the data. Is this reasonable? aday: yes. ad: nothing new to report. settling in Sweden. plan to incorporate Lincoln's things into the spec. server writeback work. bo: working on hyrax client that retrieves microarray data from a das server. functional now and is now in sourceforge. http://sourceforge.net/projects/nelsonlab. uses allen's formatted output rather than netCDF. can browse ontology annotation examples. can download. focusses on individual researcher needs in Nelson lab. plan to do it as a generic plugin, data import tool. gh: for ontology stuff, any progress with suzi and chris re: how das ontology stuff will work with center for biomedical ontologies? aday: no. will touch base with her. we're continuing to operate as previously. basically just a formatting issue. [A] allen will contact with suzi re: hooking up das ontology work with NCBO bo: the document format (XML) right? gh: i think yes. to me the goal is to have NCBO adopt it aday: even if they don't we can still link to them gh: it will take encouragement from you setting that up. aday: you can load the data brian's talking about, egr format. doesn't have location gh: igb should figure it out aday: 25,000 microarrays are available at egr. ids of probe set prefixed with the platform. we have a bed formatter, so you can request in bed to. bo: need to add a pulldown for bed. netCDF is broken now, will fix it. egr is working aday: genotyping array support in igb? gh: chromosome copy number output in igb now. gtype outputs into cnat, which outputs a graph is sgr format. ready by igb. also have files with locations of snps. should be on quickload servers. near bottom entries for 10, 100, 500k arrays. nice way to visualize when zoomed way out. aday: if you load a bed file with ids, then an egr without locations. i.e., can bed files be used as identifiers for egr files? ed: yes gh: takes up more memory, but is useful. aday: working with genotyping arrays lately. will produce more files for it in the next few weeks. basically doing lots of microarray data processing now. gh: das2 writeback server? aday: xml processing code is there, not rigged up to a webserver yet. can partially translate into insert statements. gh: can it send back mapping of temp ids to final? aday: in progress gh: i can start testing creation of features now. aday: can put it as a standalone cgi script, can point it to any url. gh: the beauty of rest. [A] allen will put writeback server on public url ed: new version of igb last week (4.38). automatic reloading via jws not working for some clients. bo: can delete your cache from jws console. ed: shortcut from desktop sometimes causes problems with updates. starting to look at better loading info about colors from different types of data files. seque's into stylesheets from das. and other igb-related things. sc: installed new version of affy das2 server on the dmz. Has gregg's temporary fix for xml:base, but currently doesn't rely on it since there's no url rewriting happening. need to test it out and do same thing on production server. Also wrote script to make deploying servers easier (eg., posting new jars, re-starting server via single make command). [A] steve will test gregg's xml:base fix on dev server Topic: BOSC submission for a talk --------------------------------- ad: planning to go, waiting to determine expenses aday: will go if main conf talk is accepted. otherwise not. gh: sounds like its up to you (dalke) ad: this is what biodas is, tools, how things fit together, how rest is cool. few submissions now (ISMB and BOSC). only 4 now. usually 12 by now. ad: bod for bosc is discussing what to do gh: do you need help from any of us for bosc submission? ad: no. will send you copies to review it. gh: I gave a talk last year on das. will send it to you as a reference. sc: part of talk can be a progress since then. cause of the low turnout? ad: people waiting to see if they are accepted before registering. ls: for me it's a cost issue. 90% of people who practice bioinfo are in northern hemisphere. was low in brisbane, will be low in china (rumors of 2008 ismb in china, can't confirm). Topic: Code sprint #3 --------------------- gh: how do people feel about having another code sprint? possibly before or after CSB in august at Stanford. the last two sprints were very good. ls: I'm at csb in aug, but right after i'll be on a retreat to work on a sequencing grant. right before will be on honeymoon. gh: maybe we need to push it farther out. ad: will be in europe until 15 july. not in us until february. bo: definitely at stanford? gh: no. august seemed like a good time/location. might make more sense to have a euro-led one. sc: august is a big vaction time for europeans ad: july is for swedes. ad: there's a late breaking poster session for ismb gh: das poster? ad: need to decide on cost today if I'm going. Topic: writeback ---------------- gh: how far behind is website vs our current thinking. that's what I'm using for my impl. ad: doesn't have idea of microdeltas. other stuff is the same. ls: does it still have the mapping idea which I thought went away (local to global)? during last codesprint. gh: it did? ad: returns back the complete feature with additional attribute. so instead of a mapping, server returns back all features which changed, along with attribute: old id ---> new id gh: whether you delete things that aren't posted in feature when you submit a new post. ad: what you post is a complete replacement of what was there. gh: that verbage needs to be added. doesn't say anything about it. [A] andrew will add text to writeback spec re: new feat being a complete replacement ad: other change: complex features all need a link back to the root feature. when parsing you can build the parent-part relationship. otherwise, you do a lot more work to figure out whose in the same group. gh: seems like a hack. ls: this is not in the current writeback doc? ad: correct. additional attribute for complex features. affects reads too (not just writeback) ls: bidrectional pointers is still there correct? parent -> child, child -> feature. ad: that's still there (unlike gff: unidirectional) if you know the root, it saves you from having to traverse links, gh: doesn't add that much. may create disagreement, errors between the parent-child hierarchy. I don't think the root thing is necessary. ls: pointer to parent and the root: like a closure across it. don't see a compelling need, makies it harder to impl. gh: if its optional, will create other difficulties. ad: makes it easy to find out where the root is. ls: just go up until you find no parent. cycles would be a bug. the issue would be if during reading from remote server, gives you children first, middle layer, then root layer, will require some merging of features. depends on data structures. in perl with gbrowse, it's holding every feat or part of feat is a node in a graph. it never merges, just updates pointers. after parse finishes, finds everything without parent and recursively traverses them. gh: if you want to attach annotations as features while parsing rather than waiting till parse is done. reference counting. don't think root thing would help then. still need to figure out do I have all children. ad: when you get a failure you can throw away just the failures rather than everything. can count parents and parts as they're coming in. gh: every feature with no parent is a root. ad: yes. assuming it comes early. ls: in general case, you cannot go on and process a feature until you reached the end of the parse. because you could have multiple layers. you can say you have found any pair of layers, not everything in berween. the root ptr doesn't help either. could still be in a situation where you think you processed everything that belongs to a... ad: something comes along later "i'm still a part of that group" gh: every time you get a feature, can add it to the feature tree, can tell when you're done with group by checking pointers. ad: ok. not as useful as I thought. [A] andrew won't add root feat attribute to complex features [so the latter is actually an 'inaction' item ;-]