Notes from the weekly DAS/2 teleconference, 28 Nov 2005. $Id: das2-teleconf-2005-11-28.txt,v 1.1 2005/11/29 03:06:04 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein UC Berkeley: Suzi Lewis Sanger: Thomas Down, Andreas Prlic Sweden: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2005. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Today's topic: Spec issues (for DAS/2 retrievals) ------------------------------------------------- We are following the agenda summary in Andrew's email: http://portal.open-bio.org/pipermail/das2/2005-November/000352.html 1) DAS Status Code in headers ----------------------------- Use http error codes and not das-specific ones. das-error to provide more detail. GH: Do we really need a detailed response document? TD: How do you distinguish different parts of the error-causing request? AD: how detailed do we need to be? LS: If you wish to do error recovery, you could have problems with one part and not another. You give up granularity. GH: Willing to give up the granularity in favor of simplicity. AD: Possibilities of error LS: How about everything that can be turned into an http error should be. And have a special section to provide das details. E.g.: client is still going to have to understand das error codes GH, AD: client does need to be there. AD: Using only http error codes reduces complexity - you only need to check one place. Another benefit - you can provide a file-based das server (this was not an use case from the RFCs, just AD's pet idea he envisions as potentially useful). GH: Can't think of DAS/1 clients that did anything meaningful with those das error codes. AD: NCBI entrez server - does lots of extra error support. Don't want to go there with das. TD, LS: DAS error codes can be used to tell client which part of the URL is at fault. Now it will be just '404 not found'. AD: REST API says use the http protocol directly. LS: There are some things in the DAS API that don't translate into http error codes. AD: We can support this with error document. [A] Use HTTP error codes and x-das-error document with code and optional description. 2) Content-type --------------- [A] No objections to using: application/x-das+blah+xml 3) Key/value data ----------------- Three possibilities summarized in Andrew's email. 1) (current spec) using namespace in attrib value. 2) (steve, lincoln) all attribute values are URI's 3) (andrew) Relax-NG based, drop in well-structured XML SC: (clarified proposal #2). For more, see today's post at: http://portal.open-bio.org/pipermail/das2/2005-November/000363.html AD: What's wrong with the Relax-NG based approach? LS: I don't understand it yet. SC: Community lacks experience with Relax-NG in general. TD: Does it let you to point to schema fragments for data types? AD: There are ways to define it in the schema, haven't looked at it. LS: This looks great. Would propose having a convention that if it's a simple, single-valued key, value should be encoded in an attribute (value="blah"), not as content of a section (CDATA). Reason: It's more consistent with rest of spec, and it's easier to parse. So in the example, genefinder-score is not correctly encoded. AD: That's not in the das: namespace, hence is not under our control. We can use this convention for things in the das namespace. AD: User can put it any xml as long as it's reasonably well-formed. We can define what well-formed is. This is what atom uses. Allows some simple key val data on client as if it were native data. It permits searches without needing to know about complex data. GH: Likes idea of allowing arbitrary xml. SC: Not completely arbitrary since we limit use of das: namespace, and possibly other aspects. LS: So we're going to say we have properties represented as key/val pairs using this syntax. You'll find 'das:' as well as possibly other namespaces. I think that works. What becomes of /property url (ptype)? Does that go away and replaced by namespace? AD: Possibly use it for data type (e.g., float). Or we could make it discoverable? LS: Easier to make it part of the spec. TD: If this can work like XML schema, we could have a pointer to an xsi. Is there a way to put a pointer to a schema url? AD: Found this to be useless. Hard coding what is expected is better than having discoverability. TD: With the xsi schema location, you can put multiple schema locations for the das schema, and your extension, separate pointers to both in a single document. AD: Never found dynamically resolved schemas useful for anything LS: In theory they are. Why not? AD: Knowing that something's an int does say what that int is supposed to mean. LS: Right. Let's make sure that the common types of annotation a server would want to return are in the spec from the get go. Anyone that doesn't care about extensions can ignore additional properties. No doubt people will make extensions to DAS/2 that are implemented on client and server that are in-house, private extensions that only work in client-server pairs. Should we allow schema fragments to be brought in via xsi? TD: this would be in the top-level element. Or can put it on an enclosing element. AD: Is there a good reason to do it? LS: Let's not seek discoverability. [A] Andrew will flesh out his Relax-NG based property encoding approach. SC: You could put your schema at the url pointed do by 'das:' AD: Don't see a need. I found that many of the DAS/1 schema fragments/documents were in valid. This didn't seem to bother DAS/1 clients and users. LS: In the real world, people don't validate. 5) xlink and ------------------- AD: The official xlink spec is long. Have not fully groked it. GH: Does anyone else have experience with it? (silence...) Seems like a reason to not go there. AD: Atom, uses link to say, "Here's some generic linked out stuff". We could use it to say, "I'm looking for the stylesheet for this thing or the schema for the xml document." GH: We need to draw line between generic links and specific things. eg. feature ids, all ids are resolvable links, and so could in principle be specified with link tags. AD: Link from feature to versioned source it's a part of. Client can figure out context from url. Use case: DAS user sends email to colleague, 'look at this url for feature X'. The other user enters URL in his das browser, client can identify the das2-versioned source given the feature URL. LS: They would rely on xml:base. Nothing in the current DAS/2 spec says that the xml base is for the versioned source. LS: But it does give you the versioned source. This is absolutely part of the spec. AD: Nothing in the spec that says that features have to be on the same machine as the rest of the data. LS: Why does user want versioned source on the same machine that the feature came from? AD: Nothing in the spec says that that a feature has to be under 'feature' in the URL. GH: Generalizing the info href element to be more generic, to specify what that link means is fine as long as we don't do this for everything that can be a link. Doc hrefs are fine, not ids. LS: We're not going to demand that people specify links. (Something about giving people enough rope to hang themselves with...) GH: Ids are opaque uris to id the feature. LS: The HTML link tag has been around a long time, and used a total of two times: style sheets, copyright statements. This could have easily been done with a stylesheet tag and copyright tag (without needing a general link tag). [A] Consider the xlink/link tags issue tabled. 6) Source filters ----------------- GH: Use case: DAS/2 client is trying to discover what registry has, query can be the same as for any das server, you can just apply additional filters when dealing with a registry. AP: Client would use tags that a registry server must implement. GH: A non-registry server can implement as well. TD: say filtering is optional in general. AD: I tend to not like optional things. Filtering is required for features. GH: The spec can state the filters that a registry is required to implement on sources query. General DAS/2 servers are not requiredd, but can if they want. What if you send a sources query with filters that it doesn't understand? LS: Return everything GH: Return error AP: Client can filter out what they want GH: It's already important to have search capability in client. Use case: On given genome, show me all gene predicitons for this region. You need to go to all servers, which could be many. AD: Can you filter by type of features that can be returned? AP: Can be added. GH: Want to be able to search on ontology term, not just id of the type. AD: Need meta-data server to ask of DAS/2 servers what features do you implement? LS: Does metadata protocol need to be part of das spec, or an additional protocol on top? There should be an optional section of DAS/2 that is implemented by metadata servers or registrys that allows you to do servers. Shouldn't overload the core server spec. GH: Concerned with the response. It's so close to the same xml, it might as well be the same. Makes it easy for clients to know about both servers and metadata servers. could call it 'sources' or something else. LS: Filtering by feature type, do we need that info that's returned by sources document? GH: No, it's part of the query. LS: Metadata server would have to do a types request. AD: What if there's a mismatch in SOFA version? LS: We're in trouble. AD: Concerned about change in meaning. SL: Not important. LS: Use case: There's a 'restriction site' node in SOFA 1.4 with five terms underneath it. In version 1.5, now there's six terms. A metadata server running off of the old version is using an incomplete node. Metadata engine should always run off the latest version. AP: Registry at Sanger checks every 2 hrs with server. AD: How is this better than having client do it itself? What features do you know with this type and this range? GH: If lots of DAS servers, this will be time intensive AD: Can we wait until there are lots of servers? AP: We have 17. LS: Current paradigm - EBI has many servers that just do one type of feature e.g, there's a server that just does repeat elements. So there are servers that will serve up one or a few feat types. AD: Had not considered that. LS: Happy to have optional filter syntax added to sources request supported by metadata servers. Gregg is right about returning error (unimplemented). Will not change protocol in fundamental way. Just an annex, just optional section supported by metadata servers. GH: Based on Andreas' queries in soap, can we squeeze everything in to params on url? filterable? AP: yes AD: optional fields will include species, build#, type, etc. [A] Add optional filter syntax to sources request. Allow unimpl error return. 7) /regions ----------- LS: In sofa, a feature of type region is root of all other features - everything is a region. Has props - ref sequence it's on, start, strandedness. The reason for region is for retrieving assemblies. SC: Region is also currently the only way to get back a list of available sequence ids without getting all sequence data. The top-level sequence request returns data along with sequence. LS/GH: region could be called 'landmarks' [A] Andrew will work directly with Lincoln on revising region request. 8) Tiled queries ---------------- LS: This doesn't need to be in spec. If client filters features by a range, is there a contract such that server must return exact range he asked for, contained in, or is ok for server to return more? GH: We need to be more strict. LS: Agree. Client should trim it. [A] Tiled queries should not be part of the spec. Other issues ------------ AP: There are still some other issues not addressed in this call. E.g., Not possible to handle situation where protein sequence in a structure varies from genome. Can defer to the next spec discussion conf call.