Convention for output of tuples #708

tillahoffmann · 2023-09-26T18:56:55Z

In the develop branch, tuples generated by Stan programs are one-dimensional arrays of tuples. This can make it difficult to access elements of tuples. For example, consider the following test.

def test_tuple_out() -> None:
    stan = os.path.join(DATAFILES_PATH, 'tuple_out.stan')
    model = CmdStanModel(stan_file=stan)
    a = np.random.normal(0, 1, (5, 5))
    b = np.random.normal(0, 1, 3)
    fit = model.sample({"a": a, "b": b}, fixed_param=True, chains=1,
                       iter_sampling=20, iter_warmup=1, sig_figs=18)
    np.testing.assert_allclose(a, fit.stan_variable("c")[0][0])
    np.testing.assert_allclose(b, fit.stan_variable("c")[0][1])

// tuple_out.stan
data {
    matrix [5, 5] a;
    vector [3] b;
}

generated quantities {
    tuple(matrix[5, 5], vector[3]) c;
    c.1 = a;
    c.2 = b;
}

Then accessing c.1 samples is only possible through list comprehension (there may be some fancy indexing I'm not familiar with).

>>> fit.c.shape
(20,)
>>> fit.c[:, 0]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Would it make sense to return a tuple of arrays rather than an array of tuples? This would allow accessing samples more easily, e.g., in the above example we'd get

>>> fit.c[0].shape
(20, 5, 5)

This would however go against the convention that the first index refers to samples.

The text was updated successfully, but these errors were encountered:

WardBrian · 2023-09-26T19:00:37Z

Would it make sense to return a tuple of arrays rather than an array of tuples

I think the issue is that both of these things (tuples of arrays and arrays of tuples) can themselves appear in a Stan program. Keeping the convention that the first index is samples and then everything is is as-written in Stan makes these cases clearer, but this is indeed at the expense of the ease of slicing "down" a single tuple.

I'd personally recommend that if a user knows they care about looking at all of the c.1s independently of the rest of c, the easiest thing to do is put a line in generated quantities that extracts this into its own named variable

ahartikainen · 2023-09-26T19:04:47Z

I would say that numpy structured dtype would be our best option?

WardBrian · 2023-09-26T19:10:34Z

Structured dtypes don't really help this as far as I can tell:

>>> import numpy as np
>>> x = np.array([(1.2, 3.4), (4.5, 6.5)], dtype='f,f')
>>> x[:,1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I've also found that they are not as easy to use in other ways (isinstance(x[1], tuple) is False, for one, but also generally they can be annoying)

stanio can now optionally give you a structured dtype'd array when you're dealing with tuples, so we could pipe that option through to the user of cmdstanpy, but I think it addresses a different set of issues

WardBrian · 2023-09-26T19:13:27Z

Ah, there an option using structured dtypes:

>>> import numpy as np
>>> x = np.array([(1.2, 3.4), (4.5, 6.5)], dtype='f,f')
>>> x.dtype
dtype([('f0', '<f4'), ('f1', '<f4')]) # note the names f0 and f1
>>> x['f0']
array([1.2, 4.5], dtype=float32)

This seems like it would get arbitrarily bad as you nest tuples, but I guess it works in theory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convention for output of tuples #708

Convention for output of tuples #708

tillahoffmann commented Sep 26, 2023

WardBrian commented Sep 26, 2023

ahartikainen commented Sep 26, 2023

WardBrian commented Sep 26, 2023

WardBrian commented Sep 26, 2023

Convention for output of tuples #708

Convention for output of tuples #708

Comments

tillahoffmann commented Sep 26, 2023

WardBrian commented Sep 26, 2023

ahartikainen commented Sep 26, 2023

WardBrian commented Sep 26, 2023

WardBrian commented Sep 26, 2023