The REST Dialogues, Part 2 with a Real eBay Architect
In Part 2, Duncan continues the interview discussing the REST advantages for setting data. Once again, I will replace my imaginary co-worker and respond to the interview questions. And yes, the standard disclaimers are still in effect. Nothing I state should be construed as product plans and these opinions are mine alone.
Duncan Cragg: Now, let's look at those calls in eBay's SOAP API that are expected to change data. There is often a corresponding 'Get' function.
Examples are GetTaxTable, SetTaxTable and GetMyMessages, ReviseMyMessages, ReviseMyMessagesFolders, DeleteMyMessages, etc.
Here, there is an implicit model of data (or lists of data) that you can look at, modify, delete, etc.
Dan Pritchett: Okay
DC: Yes, but we're talking about what the API is telling us, here. It says things like GetTaxTable and SetTaxTable!
We'll come on to business processes later, but just look at the simple stuff first.
DP: Sure, let's see how simple this stuff really is.
DC: Again, scalability and interoperability.
In the same way as with GET, you can partition your POST handling across the URI space. Have your Tax Tables and Message folders handled by many machines, split by their URIs. This is an example of the inherent parallelisability of the Web's architecture.
DP: Absolutely, and we do. Horizontal scaling is an inherent aspect of our architecture. We partition along many vectors, not just URI or function. We also partition based on subsets of data and geographic proximity.
DC: Sure, but you had to 'hard-code' this partition: to use specific heuristics and code to achieve it: URI-based partitioning is much more generic and flexible, and can be much finer-grained. You seldom get single-point of failure or bottleneck issues with good REST architectures.
DP: Most of our partitioning is data driven and not hard-coded. Ultimately though you have to map an operation onto a pool that has the necessary algorithms and access to the data. eBay has over dynamic 1500 URI's implemented by a large body of code. The scale of the problem is larger than just deploying all operations to all machines so our partitioning will always involve a certain degree of fixed mapping. We have considerable flexibility in managing these mappings but they are a necessity of our scale.
And architectures with single-point of failures aren't allowed at eBay. It's one of the fundamental principles that every architect strictly abides by.
DC: Interoperability again derives from the standardisation of the Content-Types and the schemas of POSTed data. When you know the meaning of the data you fetch, you also know how to try and change it, if you can.
DP: Sure, the structure of the entities should remain consistent whether you are getting or setting them. That's logical. Of course there are aspects of entities that are derived and effectively read-only (e,.g. current high bid on item) but that doesn't necessarily violate the principle.
DC: Well, the same applies, insofar as the GET data effectively tells you what any return data should look like. However, this time it's in an implicit way.
Where, in the read context, you understood what to do with a given content type you'd fetched, now, in the write context, you further understand what you can send back to it again.
DP: There are additional constraints on setting data though that are more problematic than reading. Attributes of entities will have restrictions on valid values. This is not necessarily defined when reading as the service is generating the entities so it controls these boundaries.
DC: Straightforward data updating is the simplest case - it's the same content type coming out as going back in. You see a Tax Table at some URI, then you just PUT a new Tax Table back at that URI to try and replace it.
All the API function calls beginning 'Set' should probably be implemented this way.
DP: Perhaps, but using the Tax Table as an example, you have attributes like region and tax rate. What are the acceptable regions? Is it an enumeration or an opaque token? This isn't something you were necessarily concerned about when you retrieved the data, but now it is essential. These semantics must be clearly understood to correctly set the data and many may actually vary from implementation to implementation. Standardization becomes problematic as you drill down to this level of semantics.
DC: An example of a data format that implies its edit capabilities by reference to a standard is Atom syndication.
You can GET a list of Atom entries, then POST a new one according to the Atom Publishing Protocol (APP) specification. You POST, not to a 'service' or handler, but to the URI of the actual collection you're adding to.
DP: I understand the example, but I am not following the point.
DC: The eBay Message lists are a great candidate for using this approach, with the benefit that any (recent) feed reader can be used to view them, and an APP-compliant client used to manage them.
All the API function calls beginning 'Add' should probably be implemented in the same way APP adds new entries, or you should consider using APP itself to implement them.
DP: I think this is a great example to illustrate the challenges of semantic compatibility versus syntactic compatibility. Atom and APP define the syntax and the semantics to a certain level but not as fully as you might think. Consider the author element. The name space for this element is undefined. Is it an email address, a full name, or a user name? If it is coming from eBay then it is user name but those have virtually no useful context outside of the eBay system, by design. Yes, you could leverage a good deal of your application's implementation to process message streams from eBay if they are in Atom format, but you couldn't really correlate those messages to any messages from other content providers.
Posting is even more interesting. One of the topics we've not discussed up to this point is authentication. The eBay message system requires authentication to avoid spoofing. REST supports authentication but there doesn't appear to be an agreed upon standard just yet. Even with a standard, each system will have its own authentication credentials which will be an interesting challenge to manage as you attempt to traverse multiple sites. Message format will ultimately be a minor challenge relative to the credential management.
[Note: My imaginary co-worker made a point about SOA that I didn't find as relevant. Please forgive the incongruity in the dialogue.]
DC: Schema proliferation is SOA's problem - so I'm glad you brought it up! It's baked in to the SOA mindset that it's OK to design your own interfaces and schemas from scratch each time.
Conversely, the expectation and culture in the Web and especially in the REST-aware community is to constantly look out for opportunities to conform and to standardise schemas and interactions, and to build on layers below that are already standardised. It's a side-effect of the shareability of URIs.
With REST you get interoperability at many levels above the byte-transfer of HTTP. Content type understanding and standardisation can occur all the way up from characters to Tax Tables.
DP: I'm having a bit of a problem following this point. REST encourages the standardization of content types to gain the efficiencies. I'll concede that SOA has done a poor job of encouraging similar standardization, but there is nothing that inherently prevents SOA from doing so. At the same time, REST providers can unintentionally, or even intentionally if they feel the business model supports it, drive away from standardization.
[Note: My imaginary co-worker got a little lost and needed more explanation.]
DC: OK: there is some code that understands basic HTTP GET and PUT, and UTF-8 characters - and doesn't need to know what a Tax Table is - some that understands UTF-8 plus XML, some that understands that plus Atom, or plus XHTML, then XHTML tables and Microformats like hCalendar, hCard (and hAtom!), then conference schedules using hCalendar and hCard. Tax Tables can perhaps use XHTML tables.
Clients may or may not need to know what the schema of a Message looks like internally in order to be able to do useful work with Messages at their level of understanding - from character stream up to APP.
DP: I think you've described the OSI network model and what I would call good application layer design. At least that's how I design my applications, but I'll grant you not everyone is as diligent. To the extent that you can properly encapsulate the micro-formats in other documents, then I agree that your application can only worry about the portions of the response that is interesting to it. I think this becomes a little more challenging in practice though.
DC: You never know, your eBay schemas might be taken into account when coming up with new standard content types for e-commerce!
DP: That would be nice.
[Note: Deleted a superfluous exchange on XHTML]
DC: This is one of the Myths of REST - that it's just for simplistic reading and writing of data.
It's a actually a myth that's often propagated by the REST community itself, especially with their over-emphasis on the Four HTTP Verbs, which seem to map so conveniently onto Create, Read, Update and Delete.
DP: We call that CRUD and those operations are low level entity persistence methods. I would agree that they aren't terribly meaningful at a business interface level.
DC: It is often good, but can detract from the whole goodness!
One consequence of this mapping it that people inevitably go on to see an analogy between REST and databases, and then start to expect transactions and other database features.
Another consequence of the database analogy is that resources are seen as lifeless servants of the active client: it takes away responsibility from a resource to be master of its own destiny.
This then causes confusion (even within the REST community) about the power of the client and even the very meaning of such basics as the PUT method.
The fact is, GET and POST are more than enough to be REST compliant. This cut-down pair also help focus the mind on URIs, two-way content transfer, content type or schema and on the responsibilities of each resource as active players in an integration scenario.
DP: No argument.
DC: Tim Bray has a history of suggesting that GET and POST are enough.
And recently, there has been further high-level support from Sam Ruby and Leonard Richardson in their manifesto for an upcoming book.
DC: So, to summarise: use GET to read data, then POST back to the same URI to suggest changes to the resource there.
All based on a given level of understanding and interpretation of the content type and any corresponding interaction standards.
Interoperability and scalability in a nutshell!
Now I will explain how there's more to REST than simple reading and (attempted) writing of resources. We're still only two-ninths done...
DP: I look forward to seeing where we go next.
Technorati Tags: architecture, atom, ebay, engineering, http, programming, protocol, rest, scalability, services, soa, software, to_read, toread, web
"Atom and APP define the syntax and the semantics to a certain level but not as fully as you might think. Consider the author element. The name space for this element is undefined. Is it an email address, a full name, or a user name?"
That's somewhat inaccurate: http://www.atomenabled.org/developers/syndication/#person
The 'uri' subelement could potentially and likely point to the person's profile on eBay.
Posted by:Keith Gaughan | Friday, February 23, 2007 at 03:29 AM