Multi-indexing#

mloc#

MulDataFrame uses .mloc to perform multi-indexing. It implements a different multi-indexing pattern from that of pandas. pandas implements hierarchical indexing while MulDataFrame implements what we call successive indexing. Their difference is discussed later on this page. The input to .mloc can be a list or a dict. If a list is used, it has a similar syntax as that in pandas except that you don’t need to create a pandas.IndexSlicer object. Just input a plain list with ... as placeholders to select all values in a column.

>>> mf
(3, 2)    g  7  6
          f  5  3
             c  d
--------  ---------
   x  y      c  d
a  1  2   a  1  2
b  3  6   b  8  9
b  5  6   b  8  7
>>> mf.mloc[[..., 6],[3]]
(2,)     g  6
         f  3
            d
-------  ------
   x  y     d
b  3  6  b  9
b  5  6  b  7

The above example uses the “y” column in the index dataframe to select the 2nd and 3rd rows and the “f” columns in the columns dataframe to select the 2nd column.

As the column selection is a scalar selection, the return value is a mulseries rather than a muldataframe. The hierachical indexing in pandas implements a different behavior that a scalar selection in a data frame on a single level still results in a data frame. Only when selections on all levels are scalar selections does hierachical indexing result in a series. In successive indexing, a scalar selection on any level results in a mulseries. We believe our implementation is a more intuitive choice as it is what is expected from a single index. Of note, a scalar label does not always lead to a scalar selection as the label might be duplicated.

The reason successive indexing is named as such is illustrated by the following example:

>>> mf.mloc[[[1,3], 6]]
(2,)     y  6
         x  3
            b
-------  ------
   f  g     b
c  5  7  c  8
d  3  6  d  9
>>> mf2 = mf.mloc[[[1,3]]]
>>> mf2.mloc[[..., 6]]
(2,)     y  6
         x  3
            b
-------  ------
   f  g     b
c  5  7  c  8
d  3  6  d  9

So a multi-indexing operation on a muldataframe is the same as successively applying the indexer in each column of the index data frame to the muldataframe. It adheres to the idea that a multi-index is just a collection of single indexes. Hierarchical indexing in pandas, on the other hand, is more like selecting the intersection of the different levels’ indexers. Their difference is illustrated by the following example:

>>> mf.mloc[[[3], [2,6]]]
Error
>>> df = mf.df
>>> df.index = pd.MultiIndex.from_frame(mf.index)
>>> df
     c  d
x y
1 2  1  2
3 6  8  9
5 6  8  7
>>> ix = pd.IndexSlice
>>> df.loc[ix[[3],[2,6]],:]
     c  d
x y
3 6  8  9

Of note, mf.mloc[[3,[2,6]]] will report a KeyError as successive indexing does not allow multi-index selection after a scalar selection. On the other hand, hierachical indexing in pandas allows this but with the intricacy that it does not support real scalar selection: df.loc[idx[3,[2,6]],:] yields a dataframe instead of a series. The example also shows you must fill in : as the column indexer in hierachical indexing even if you only need to index rows.

With a dict as input, you can change the order of successive indexing and fix the error in the above code block:

>>> mf.mloc[{'y':[2,6],'x':[3]}]
(1, 2)    g  7  6
          f  5  3
             c  d
--------  ---------
   x  y      c  d
b  3  6   b  8  9
```

In this example, the muldataframe is first indexed by the “y” column and then the “x” column of the index data frame.

When there are duplicate names in the columns of the index or columns dataframe, use the last column for dict indexing.

>>> mf2
(3, 2)      g  7  6
            f  5  3
               c  d
----------  ---------
   x  y  y     c  d
a  1  2  8  a  1  2
b  3  6  5  b  8  9
b  5  6  2  b  8  7
>>> mf2.mloc[{'y':[2]}]
(1, 2)      g  7  6
            f  5  3
               c  d
----------  ---------
   x  y  y     c  d
b  5  6  2  b  8  7

Users can mixedly use the two types of indexers in a muldataframe with, for exmaple, a dict indexer for the rows and a list indexer for the columns.

>>> mf.mloc[{'y':[2,6],'x':[3]},[..., 7]]
(1,)      g  7
          f  5
             c
--------  ---------
   x  y      c
b  3  6   b  8

.mloc is also implemented for MulSeries:

>>> ms = mf['c']
>>> ms.mloc[[..., 6]]
(2,)     g  7
         f  5
            c
-------  ------
   x  y     c
b  3  6  b  8
b  5  6  b  8

You can also use .mloc to set values:

>>> mf3 = mf.copy()
>>> mf3.mloc[{'x':3},{'f':5}] = 7
>>> mf3.df
(3, 2)    g  7  6
          f  5  3
             c  d
--------  ---------
   x  y      c  d
a  1  2   a  1  2
b  3  6   b  0  9
b  5  6   b  8  7
>>> mf3.mloc[[..., 2]] = [3,5]
>>> mf3.df
(3, 2)    g  7  6
          f  5  3
             c  d
--------  ---------
   x  y      c  d
a  1  2   a  3  5
b  3  6   b  0  9
b  5  6   b  8  7

nloc#

MulDataFrame and MulSeries also implements .nloc to enable position-based multi-indexing. If a list is used as input, it behaves exactly the same as .mloc. If a dict is used, it behaves similarly to .mloc except that instead of using column names as keys, it uses the numeric positions of the columns as keys.

>>> mf2.nloc[{1:6}]
(2, 2)      g  7  6
            f  5  3
               c  d
----------  ---------
   x  y  y     c  d
b  3  6  5  b  8  9
b  5  6  2  b  8  7

Note that with a dict as input to .mloc, you can only select the last “y” column in the index dataframe. Using .nloc you can select the first “y” column.

.nloc can also be used to set values.