Notes from the weekly DAS/2 teleconference, 11 Sep 2006 $Id: das2-teleconf-2006-09-11.txt,v 1.1 2006/09/11 18:10:11 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor (sc, aday, bo calling in from Seattle at MGED9 jamboree) Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: -------- * grant update * status reports Topic: Grant update ------------------- gh: p good says funding outlook for getting funding for sep '06 to may '07. $250K. not completely official, but more so. no grant to be submitted in october. still major issues to resolve: rewriting, pi decision. size was a concern. decision about what to drop (6 sections). ad: new project starting dec/jan for 1 year. Can't work on das/2 past end of this year. product for chemical informatics. gh: can you put more time before then. full time? 2-3 mos. ad: need to look at my schedule. will get back to you [A] andrew talk with gregg re: increasing his das/2 time committment Topic: Status reports (and general discussion) --------------------- gh: client to do curation in igb, write back to test server. impl thing I drew on board back at last code sprint. editing curations. making sure undo/redo capabilities in igb works. will translate into what writeback needs are. turned off in igb by default. prefs -> turn on exptl curations. can edit things, but can't connect to server. must modify code, but don't ee: gff3 parser. trouble: gff3 files in wild don't follow spec. refseq website, repository, all three fails in different ways. ucsc mailing list helped, but it wasn't their files. aday: failed on validator? ee: yes gh: the only request we had ee: not trying to write a full gff3 parser. just need gene, exon, cds, mRNA. ignore other lines and it seems compliant. but a second problem: very flexible exon parent can be mRNA, gene, or nothing. jibes with igb data model. also worked on: released new igb version. graph support handing, parsing affy files. ad: flybase files are gff3 compliant, parent/part relationship requires full file parsing. 800mb file. had to insert marker mid-file to inform parser. ee: space reduction during parsing. they have a recommended canonical rep of gene, but not required to do it. haven't found an example that follows the rec. gh: the wormbase stuff should be canonical, since lincoln did gff3 and wormbase. ad: more people writing gff3 than reading ee: ucsc discussion: grant to support more mod orgs, to include gff3 parser support. gh: that's the kind of grant we'd like to fold das grant work into if we don't do a separate das/2 grant [A] gregg look into ucsc grant, possibly fold das stuff into it ad: gff3 -> das2xml converter. some things in gff3 i don't know how to handle. key-value. Need to figure out why things aren't passing validator. [A] andrew will write up questions, post to list, discuss there and/or with lincoln at the next das/2 teleconf. ad: modeling alignments. need a recommended way to model alignments. gh: when to use locations vs subfeatures. aday: why care about gff3? ee: igb ad: people need to convert data for das2xml. aday: need a model mapping doc. we can hash it out next week with lincoln. ad: working with berkeley xml database. liking it alot. gh: also cool: SOLR - java thing built on top of lucene and xml db stuff. cool thing is that it layers on top of that a rest-ful approach to retrieving and writing data to a db. thru http urls . queries are gets all writes/updates/delete are posts. ad: xQuery aday: generalization of xpath ad: xslt is another generalization. sc: there was a poster at MGED9 meeting from stanford group using Berkeley XML db to map between 'flavors' of MAGE-ML, since organizations use different ways to represent the same thing in MAGE-ML. Represented the transformation using pairs of xQueries, one targetting for format A, other for format B. All the smarts about the format was confined to the xqueries. nice. ad: I want to get feedback regarding modeling for das2, recommendation to store certain data (alignments, gff3). gh: gff3 - too open ended. lots of stuff can be in there ad: given flybase, what is the recommended way to post gff3 data. gh: i can answer your alignments issue, can't do gff3. [A] andrew will contact folks as needed regarding gff3/flybase modeling issues: suzi, chris mungall, lincoln, scott cain Other status: ------------- sc: no major progress given Netaffx update work, MGED travel. Plan is to update das/2 server code on affy server, load it with some exon array design data using gregg's new parser which is more memory efficient, and test it out. Then we'll need to migrate it off the das/1 server where the exon data hogs lots of memory, and then migrate Netaffx links to use das/2. gh: new box end of october with das grant money. have run das2 server on 64bit. on 32bit have gotten 8g in single java process. riva. should be able to get 16g in one process. or have 2x8g bo: allen updated assay portion, bringing igb ibjects upto date. mark carlson is updating hyrax client to retrieve microarry data back. he's taking das/2 client makeing it embedable. eg., into the MeV tool from John Quackenbush at Harvard (java). should be embedable in igb to browse celsius to d/l data. plan to have webstart for it. aday: updating assay portion of server. mage-ml to be inline with changes. adding/modifying element attribs, lowercase 'uri'. data loaders to get ncbi data into server for micoarray expts. client lib in R for talking to das server. requires parsing xml. extremely slow, uses lots of memory, so eg., viz bed files in R, genomic location. good plotting support in R. look at distribution. regarding writeback server: on hold until you report any problems. basic stuff is working. let me know. gh: read part: caching improvements? aday: no more work on that since jamboree. public server doesn't have these improvements. plan to rewrite controller and view part. junk on this end. want to integrate block mechanism into that as well. not sure when it will happen. time estimate: maybe 1-1.5 months with bo and i working half time. bo: thie rewrite will help a lot. aday: lots of little things changed, 'segment' etc. server domain source, capabilities, formats. huge mess. need more looking before i can get an accurate time estimate for patching vs. rewriting. think the rewrite wouldn't be that expensive. gh: machine? aday: dual core opteron, maybe 16g ram? load is increasing, may move off to a dedicated server. webserver is the issue, not db. Next teleconf: -------------- In two weeks. 25 Sep 2006 Special dedication: ------------------- To those who tragically lost their lives on this day five years ago...