Discussion:
Subsets - projects
Ralf Hemmecke
2007-03-14 20:54:24 UTC
Permalink
Hi Martin,

with the right implementation of cis in Subset, the test now runs in
reasonable time.

You see, the problem is the multiplication of species.

For 274668 the computation goes up to degree about 36 of Subset.
The Subset CIS is about as expensive as the CIS for SetSpecies.

Ralf

time TestSuited
summary of the test suite:
============================

test cases : 11
tests : 166
tests succeeded : 162
tests failed : 4

real 0m44.297s
user 0m34.176s
sys 0m0.109s




testFunctorialCompose(): () == {
macro {
CS == CombinatorialSpecies;
E == SetSpecies;
E2 == RestrictedSpecies(E, 2);
WP == Subset; -- Times(E, E);
P2 == Times(E2, E);
}
SimpleGraph(L: LabelType): CS L == FunctorialCompose(WP, P2)(L)
add;
import from Integer, Array Integer;
check(
SimpleGraph,
[1, 1, 2, 8, 64, 1024, 32768, 2097152, 268435456, 68719476736],
[1,1,2,4,11,34,156, 1044, 12346, 274668],
[...]



<<implementation: Subset>>=
local cis(): CycleIndexSeries == BugWorkaround(
CycleIndexSeries has with {exponentiate: % -> %}
){
g: Generator P := generate {
import from I, Z, Q, V, T, P;
yield 0$P; -- constant term
for n:I in 1.. repeat yield [2/(n::Z) , power(n::V, 1)]$P;
}
import from DataStream P;
exponentiate(g :: CycleIndexSeries);
}
cycleIndexSeries: CycleIndexSeries == cis();
@

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Martin Rubey
2007-03-14 21:18:50 UTC
Permalink
Dear Ralf,
Post by Ralf Hemmecke
with the right implementation of cis in Subset, the test now runs in
reasonable time.
Yes, that's not so surprising. However, as I said before, I do not think that
this is the right cure.

In fact, it also implies that I should pursue my idea of "bundling" power
series with "expressions". The reason that your implementation of the CIS of
Subset is fast is, that you do the multiplication "symbolic", albeit not using
the capabilities of a CAS...
Post by Ralf Hemmecke
You see, the problem is the multiplication of species.
I think that this is only one part of the problem. It would be really nice
(and probably something one could "sell") to have a power series package where
the timings of the two variants would be nearly the same...



Martin


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Ralf Hemmecke
2007-03-15 05:46:18 UTC
Permalink
Post by Martin Rubey
Post by Ralf Hemmecke
with the right implementation of cis in Subset, the test now runs in
reasonable time.
Yes, that's not so surprising. However, as I said before, I do not think that
this is the right cure.
Oh, but Subset is somehow as reasonable as Partition (which is
E\circ(E_+)) because Subset (which is E*E) is actually needed in the
definition of Times.
Post by Martin Rubey
In fact, it also implies that I should pursue my idea of "bundling" power
series with "expressions".
Wasn't that our plan from the beginning? Of course, we should analyse
the grammar that builds the species in order to determine a better
formula for the series than the generic one. Another big and interesting
project, because that involves writing symbolic solvers, ie, turning the
grammar into good recursion formulas for the series.

But I somehow don't like your SpeciesExpression implementation. The
reason is simple. If a couple of species is defined by a set of
definitions like in

A(L: LabelType): CombinatorialSpecies L == (E + X*B*B*B)(L) add;
B(L: LabelType): CombinatorialSpecies L == (E + X*A*A)(L) add;

then the data structures of A and B already contain the knowledge about
the equation. One "simply" has to extract it. What I want to say is that
the domain "SpeciesExpression" is completely superfluous.
Post by Martin Rubey
The reason that your implementation of the CIS of
Subset is fast is, that you do the multiplication "symbolic", albeit not using
the capabilities of a CAS...
Hmm, unfortunately, we cannot do better at the moment, since SetSpecies
doesn't actually "know" that its series is exp(something). That's a pity
and should and must be changed in the future. I'm sure you remember that
there was a reason to make the series stuff external. From the beginning
we wanted a series to be the stream of coefficients *and* a "closed"
formula. We just haven't implemented it. But using the power of Aldor we
would just have to write a new coupled series domain that knows about
its closed form (if that is know or can be computed).
Post by Martin Rubey
Post by Ralf Hemmecke
You see, the problem is the multiplication of species.
I think that this is only one part of the problem. It would be really nice
(and probably something one could "sell") to have a power series package where
the timings of the two variants would be nearly the same...
As you said above that involves some "symbolic" stuff. So let the fun
begin. What we have done up to now was more or less setting the ground
for the hard stuff. So there is a good reason to make a first release
before we go on.

Ralf

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Martin Rubey
2007-03-15 10:41:46 UTC
Permalink
Post by Martin Rubey
Post by Ralf Hemmecke
with the right implementation of cis in Subset, the test now runs in
reasonable time.
Yes, that's not so surprising. However, as I said before, I do not think that
this is the right cure.
Oh, but Subset is somehow as reasonable as Partition (which is E\circ(E_+))
because Subset (which is E*E) is actually needed in the definition of Times.
Quite true. Thus, also the CIS of partitions should be implemented using this
formula.

I am exaggerating, I know. It does make sense to implement certain special
cases more efficiently. But, I think the *main* goal in this species project
(remember: species can be seen as a *combinatorial model* for formal power
series, and that was the initial motivation for Joyal et al.) should be to make
"decompositions" efficient.
But I somehow don't like your SpeciesExpression implementation.
Me neither, since it doesn't work. ALthough, that should be fixable.
The reason is simple. If a couple of species is defined by a set of
definitions like in
A(L: LabelType): CombinatorialSpecies L == (E + X*B*B*B)(L) add;
B(L: LabelType): CombinatorialSpecies L == (E + X*A*A)(L) add;
then the data structures of A and B already contain the knowledge about the
equation. One "simply" has to extract it. What I want to say is that the domain
"SpeciesExpression" is completely superfluous.
I don't see how this could be done.
From the beginning we wanted a series to be the stream of coefficients *and*
a "closed" formula. We just haven't implemented it.
Yes, at least I wanted that. Currently, there is no CAS that would be able to
do such a thing.

By the way, this idea of "bundling" power series with "expressions" is quite
connected to the idea of caching individual coefficients, as "needed" for
functorial composition: as you know, there are generating series that have a
"nice" closed form expression, although their coefficients are not
"nice". Conversely, there are also generating series that do not have a "nice"
closed form, but this time the coefficients are "nice". A very good example are
the Bell numbers, I think.
What we have done up to now was more or less setting the ground for the hard
stuff. So there is a good reason to make a first release before we go on.
If you want to announce a "release", please go ahead. I will certainly do some
advertising next Tuesday, although I will probably use the iso-experiment
branch, since people are interested in isomorphism types mainly.

(Do not worry, I will add that we meanwhile have a better design...)

Martin


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Ralf Hemmecke
2007-03-15 11:34:47 UTC
Permalink
Post by Martin Rubey
Post by Martin Rubey
Post by Ralf Hemmecke
with the right implementation of cis in Subset, the test now runs in
reasonable time.
Yes, that's not so surprising. However, as I said before, I do not think that
this is the right cure.
Oh, but Subset is somehow as reasonable as Partition (which is E\circ(E_+))
because Subset (which is E*E) is actually needed in the definition of Times.
Quite true. Thus, also the CIS of partitions should be implemented using this
formula.
What? If you look at cis$Partition then you see that I *don't* implement
it as compose(cisSet, cisNonemptySet), but rather use the direct formula
from BLL.
Post by Martin Rubey
I am exaggerating, I know. It does make sense to implement certain special
cases more efficiently. But, I think the *main* goal in this species project
(remember: species can be seen as a *combinatorial model* for formal power
series, and that was the initial motivation for Joyal et al.) should be to make
"decompositions" efficient.
We will certainly do in the future, but I think a more interesting
project is the "symbolic" approach even if that doesn't help much for
the general CIS since there are not so many closed forms.
Post by Martin Rubey
But I somehow don't like your SpeciesExpression implementation.
Me neither, since it doesn't work. ALthough, that should be fixable.
The reason is simple. If a couple of species is defined by a set of
definitions like in
A(L: LabelType): CombinatorialSpecies L == (E + X*B*B*B)(L) add;
B(L: LabelType): CombinatorialSpecies L == (E + X*A*A)(L) add;
then the data structures of A and B already contain the knowledge about the
equation. One "simply" has to extract it. What I want to say is that the domain
"SpeciesExpression" is completely superfluous.
I don't see how this could be done.
How do you think I compute an approximate order for a recursively
defined power series? Right, the data structure contains information
about the way the series is defined. For species it is approximately the
same.

Building a species like A above, it knows that it is a Plus of E and
C=X*B*B*B. E should know that it is a leaf, same for X.
C knows that it is a product etc. etc. One just has to walk down the
structure until one arrives at a leaf or at A again (then it's recursive).

So to exaggerate a bit: a species *is* an expression. But perhaps we
must be able to convert it to ExpressionTree. Maybe we should first
think about what we want to do with that information.

1) It could be used to implement special series/symbolic expressions for
series if we know the symbolic expressions of the input series.

2) It could and should be used for printing.

3) It should be used for something like the attributed grammar (see
Maple combstruct).
Post by Martin Rubey
From the beginning we wanted a series to be the stream of coefficients *and*
a "closed" formula. We just haven't implemented it.
Yes, at least I wanted that. Currently, there is no CAS that would be able to
do such a thing.
It is amazing... one starts a little project and after a while there are
too many branches one could work one.
Post by Martin Rubey
By the way, this idea of "bundling" power series with "expressions" is quite
connected to the idea of caching individual coefficients, as "needed" for
functorial composition: as you know, there are generating series that have a
"nice" closed form expression, although their coefficients are not
"nice". Conversely, there are also generating series that do not have a "nice"
closed form, but this time the coefficients are "nice". A very good example are
the Bell numbers, I think.
More thinking needed (at least on my side) ...
Post by Martin Rubey
What we have done up to now was more or less setting the ground for the hard
stuff. So there is a good reason to make a first release before we go on.
If you want to announce a "release", please go ahead.
OK.

Ralf

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Continue reading on narkive:
Loading...