Protocol‎ > ‎Design Documents‎ > ‎

Search wave design proposal

    Alex North (anorth@google.com), Joseph Gentle (josephg@gmail.com)

    2010-09

    Note: This design is published as a suggestion for the future but we do not intend to implement it immediately. WIAB can instead use the search functionality in the Data API for a simpler initial search/inbox transport, though without the benefits of liveness.

    Objective

    This design proposes a format for embedding of query results in a wave. Clients will open a wave to receive results of a search or inbox query.

    Goals:
    • Support inbox and search queries
    • Support live results

    Background


    FedOne currently implements the inbox as a singleton wave with a wavelet per other wave in the system. This model does not easily generalise to modelling other query results and introduces scaling problems as the number of waves grows. The implementation does not make good use of established wave modelling practises.

    Google Wave implements search results as a streaming RPC, but we always imagined implementing them as a wave.

    Requirements and Scale


    Wave in a Box requires:
    • Live inbox results, which are from a query over the waves a user participates on
    • Search results from full-text search queries

    Live results means that as query results change in response to wavelet deltas the new results are pushed to the client immediately. Search queries except for the inbox will not initially be live but the protocol should support live results in the future.

    Result sets are potentially unbounded, but the server must deliver a finite subset to the client. The client should be able to access different subsets of results. The format of the result data should be easily extensible with minimal system changes required to add new information in the future.

    A user can have multiple concurrent sessions, with different search queries and views of their inbox.

    Latency from performing a search (including the implicit inbox search upon client load) should be minimised.

    Design Ideas


    Search results as a wave

    This design proposes embedding search results in a wave. Such a mechanism re-uses all of Wave's optimism, liveness, reconnection, offline, etc. "for free".

    When a client session begins the server provides the client with two wavelet names (in different waves):
    • The inbox wavelet
    • The search wavelet

    The inbox is supplied in a dedicated wave as it is the most frequent query. Maintaining the wave removes the need to re-query each time the user returns to their inbox. All other search results share a single search wave. For now this limits a client to a single search query at a time but this can easily be lifted in the future.

    The client opens the inbox wave and can expect to see data representing a view of its inbox.

    To search, the client performs an RPC to specify a new search query and opens the search wave if necessary. The server updates the content of the search wave with the query results.

    Query RPC

    The query search query request is a simple RPC specifying the new query. The RPC is performed over the client's websocket (or emulated) connection, providing ordering of messages, etc.

    Request:
    • Query string, e.g. "with:fred@example.com tag:designdoc query wave"
    • Params
      • from: index of the first result to fetch
      • num: max number of results to fetch

    Response:
    • Success or failure with reason

    The client can update the search query (for example, changing the limit/offset) by making a new request.

    The inbox wave requires no query string and is implicitly requested when a client session begins.

    Query results wave

    Search and inbox results are each presented in a single wavelet in a similar format.

    Open question: One document with all results (server does ordering) or one document per result (client does ordering)?

    <query>
    <search v="with:fred@example.com tag:designdoc query wave" />
    <range from="0" num="10" />
    </query>

    <result lmt="123456890" msg="4" unread="3">
    <title>Query wave - design proposal</title>
    <snippet>This design proposes a format for embedding of query results in a wave.</snippet>
    <participants>
    <participant id="fred@example.com" />
    <participant id="jane@example.com" />
    </participants>
    <... extensible ...>
    </result>

    <result>...<result>

    Query waves are read-only to the client.

    Server implementation notes

    Query waves are in use only for the lifetime of a client session. New waves are used when a new client session begins. The histories of query waves are not valuable beyond the life of a client session (during which they may be useful for reconnection) and the server should delete them after a client session ends.

    Query waves should never appear as results of other queries. Query result waves are not federated.

    Client implementation notes

    The digest panel may be implemented as a collection of doodads each representing one result. The doodads may be backed by the wave data and change in response to updates to the data (e.g. a changing snippet or participant list).


    Alternatives Considered

    We considered embedding the query request in a wave id, something like "q+with:fred". This was mainly in an attempt to reduce round-trips, but the current scheme has just one round trip for both inbox and search result waves in any case. It would also remove the need for a separate query RPC. However, the identifiers would be long and embedding of queries somewhat complex, requiring escaping and tokens to guarantee session uniqueness, etc.

    We considered making the query request in the query wave, removing the query RPC. The request could be modified by writing directly into the results wave. E.g. the client appends <change> requests into a <modify> section, like:
    <modify>
    <change>
    <query value="bar"/>
    <results from="50" to="100"/>
    </change>
    </modify>
    and some server-side component processes the change and deletes it when done.

    Pros of an RPC:
    • Simple, well understood mechanism
    • Less code interpreting the query
    • No problems with atomic requests (when is the <modify> section ready for processing?>
    Cons of an RPC:
    • Another RPC/servlet
    • Conceptual overhead of request/response on different "bands"

    We also considered implementing search as a simple streaming RPC. Implementing a separate response stream requires implementation of liveness, diffs, reconnection handling etc. Embedding the results in a wave builds on those capabilities already provided and demonstrates the flexibility of the Wave protocols.
     
Comments