Maps in exslt2?

Discussion:

Maps in exslt2?

Vladimir Nesterovsky

15 years ago

Hello!

A while ago I have proposed to introduce maps as built-in types in
xpath/xquery type system: Tuples and maps
(http://www.w3.org/Bugs/Public/show_bug.cgi?id=5630).

The suggestion has been declined (probably my arguments were not
convincing). I, however, still think that maps along with sequences are
primitives required to build sensible (not obscure one) algorithms. To
see that map is at times is the best way to resolve the problems we shall
refer to an utility function to allocate names in scope. Its signature
looks
like this:


<xsl:function name="t:allocate-names" as="xs:string*">
<xsl:param name="prefixes" as="xs:string*"/>
<xsl:param name="names" as="xs:string*"/>
<xsl:param name="name-max-length" as="xs:integer?"/>

Just try to program it and you will find yourselves coding something like
one defined at cobolxom
(http://www.nesterovsky-bros.com/download/cobol-names.xslt).

To be efficient such maps should provide, at worst, a logarithmic
operation
complexity:
Access to the map through a key (and/or by index) - complexity is LnN;
Creation of a new map with added or removed item - complexity is LnN;
Construction of the map from ordered items - complexity is O(N);
Construction of the map from unordered items - complexity is O(N*LnN);
Total enumeration of all items - complexity is O(N*LnN);

These performance metrics are found in many functional and procedural
implementations of the maps. Typical RB and AVL tree based maps satisfy
these restrictions.

What I suggest is to introduce map implementation into the exslt (assuming
inline functions are present). As a sketch I have implemented pure AVL
Tree
in Xslt 2.0
(http://www.nesterovsky-bros.com/weblog/2010/02/28/AVLTreeInXslt.aspx):

test.xslt (http://www.nesterovsky-bros.com/download/tree/test.xslt) - a
little test;
tree.xslt (http://www.nesterovsky-bros.com/download/tree/tree.xslt) - an
AVL
tree implementation;
tuple.xslt (http://www.nesterovsky-bros.com/download/tree/tuple.xslt) - a
tuple interface and implementation.

I do not assume that implementation like this should be used, but rather
think of some extension function(s) that provide(s) a better performance.

What do you think?
Are maps with required operation complexity already implemented and I've
just missed them?

Thanks.
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com/

Michael Kay

15 years ago

Permalink

Post by Vladimir Nesterovsky
The suggestion has been declined (probably my arguments were
not convincing).

You can probably assume that different people voted against it for different
reasons. Some may have felt the arguments unconvincing; some may have felt
that it was outside the scope of the 1.1/2.1 release (which some people
think is almost finished); some may have felt the priority was to get
higher-order functions into this release and richer data structures would be
easier to define once that was done. It may have failed because those who
thought the idea was good weren't prepared to do the extensive leg-work to
define the facility in detail. And of course there will be a lot of debate
about the detail, such as what kinds of values are permitted as map keys (I
would say anything with an "eq" operator), whether there is a difference
between a key being absent versus mapping to an empty sequence,

While there are certainly (in my view) many use cases for maps and other
richer data structures, I'd be more confortable if there was some kind of
unifying strategy rather than adding things one-by-one.

One design approach is that a map is nothing more than an extensional
function from keys to values. But that still leaves the question as to how
the mapping is initialized. And it gives no ability to iterate over the
keys, which is often important.

As it happens we were struggling with a piece of design in XSLT 2.1 last
week where the ability to have a function that returned a map would have
made life much easier. Perhaps I'll try to bring it back in as a solution to
that specific problem.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

Vladimir Nesterovsky

15 years ago

Permalink

Post by Michael Kay

Post by Vladimir Nesterovsky
The suggestion has been declined (probably my arguments were
not convincing).

You can probably assume that different people voted against it for

different

Post by Michael Kay
reasons. Some may have felt the arguments unconvincing; some may have

felt

Post by Michael Kay
that it was outside the scope of the 1.1/2.1 release (which some people
think is almost finished); ...

I can understand arguments of the WG. That is why I'm talking now of exslt,

and is aiming to binary trees. As items in binary tree algorithms do not
play a role (they are just attached to the tree), implementation is
modular:

Interface for a tree itself includes functions to:
create a tree (create, insert, remove);
navigate subtree tree (like left, right, size of subtree);

Map is a tree, which assumes that items are key, value pairs. Thus
interface for the map extends tree with search, and create (create, insert,

delete) according to a key.

Tree items should probably be expressed in terms of function items.
The sample implements items as xml elements, which makes it inefficient (in

xslt 2.1 implementation is efficient if one counts operation complexity).

But my point was to open a discussion and to show that things are not so
complicated and are very modular.

P.S. Sorry, if you got this message twice.
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com/

John Snelson

15 years ago

Permalink

Hi Vladimir,

I think it's really important to get some good data structures into
XQuery and XSLT 2.0. I've already written a read/black tree
implementation in pure XQuery 1.1 (using 1st class function closures)
which I was planning to release via EXPath when appropriate.

I designed my map interface like this:

declare function map:put(
$map as function() as item()*?,
$key as item(),
$value as item()*
) as (function() as item()+)+

declare function map:get(
$map as function() as item()*?,
$key as item()
) as item()*

declare function map:contains(
$map as function() as item()*?,
$key as item()
) as xs:boolean

declare function map:fold(
$f as function(item()*, item(), item()*) as item()*,
$z as item()*,
$map as function() as item()*?
) as item()*

It's a declarative, functional API, so putting an entry into a map
creates a new map. The keys are singleton items, and the values are
sequences. The map:fold() function visits each map entry executing $f
and accumulating a result from it.

Looking at the API now, I'm thinking it probably needs map:remove() to
remove an entry, and maybe map:keys() to return a sequence of all the
keys in the map.

I'm also thinking about other useful functional data structures to
implement using XQuery 1.1. I'll probably implement a finger tree at
some point to use as an array/queue etc.

John

...

--
John Snelson, Oracle Corporation http://twitter.com/jpcs
Berkeley DB XML: http://oracle.com/database/berkeley-db/xml
XQilla: http://xqilla.sourceforge.net

Vladimir Nesterovsky

15 years ago

Permalink

Post by John Snelson
I think it's really important to get some good data structures into
XQuery and XSLT 2.0. I've already written a read/black tree
implementation in pure XQuery 1.1 (using 1st class function closures)
which I was planning to release via EXPath when appropriate.

That's good! Can I look at implementation?

Technically, neither RB nor AVL trees need to know of key/value separation
(they work with items). Key appears during search operation only.

I've prefered to implement AVL tree as it allows to build a map with access
by index, and by key at the same time.
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com/

John Snelson

15 years ago

Permalink

Post by Vladimir Nesterovsky

That's good! Can I look at implementation?

Sure - attached.

Post by Vladimir Nesterovsky
Technically, neither RB nor AVL trees need to know of key/value separation
(they work with items). Key appears during search operation only.
I've prefered to implement AVL tree as it allows to build a map with access
by index, and by key at the same time.

I've never needed to access a sorted map by index. I don't see what
you'd need that for except iterating the members of the map - and that's
definitely better done in a different way.

John

--
John Snelson, Oracle Corporation http://twitter.com/jpcs
Berkeley DB XML: http://oracle.com/database/berkeley-db/xml
XQilla: http://xqilla.sourceforge.net

Vladimir Nesterovsky

15 years ago

Permalink

Post by John Snelson

Post by Vladimir Nesterovsky

That's good! Can I look at implementation?

Sure - attached.

Thanks.

Post by John Snelson

Post by Vladimir Nesterovsky
Technically, neither RB nor AVL trees need to know of key/value

separation

Post by John Snelson

Post by Vladimir Nesterovsky
(they work with items). Key appears during search operation only.
I've prefered to implement AVL tree as it allows to build a map with

access

Post by John Snelson

Post by Vladimir Nesterovsky
by index, and by key at the same time.

I've never needed to access a sorted map by index. I don't see what
you'd need that for except iterating the members of the map - and that's
definitely better done in a different way.

Don't say never.
There are tasks where you look at your data from two perspectives:
as a map;
as a random access (and update) list.

Another thing is that AVL tree is slightly better balanced.
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com/