.. _programmingGuide: ====================== Programming guide ====================== Because *Idefix* is designed to run on hybrid architecture, there are some subtilities which are not found in other classical CPU-only codes. These subtilities are described in this section, along with some basic coding rules which can be handful. Data types =========== Because *Idefix* can run on GPUs, and since GPUs experience a significant speedup when working with single precision arithmetic, a specific ``real`` datatype is used for all floating point operations in *Idefix*. This is by default aliased to ``double`` in `idefix.hpp`, but it can easily be modified to ``float`` for specific problems. Note however that this loss of precision might have a strong impact on the quality of the solution and is therefore not recommended. Host and device =============== *Idefix* relies on the `Kokkos `_ framework, and therefore assumes that system it's running is made of two sub-systems: a host and a device. The host is traditionnaly the CPU, and is taking care of inputs and outputs, initialisation and allocation, MPI data exchanges. The device is usually an accelerator (e.g. a GPU) and is actually performing the computation (or most of it). Note that while *Idefix* assumes there is a host and a device, they can be the same processing unit (think of the code running only on your laptop CPU). In this case, Kokkos performs several optimisations, so that everything effectively runs on the host smoothly. By construction, the host doesn't have direct access to the device memory and vice-versa. This means that accessing an array on the device from the host will inevitably lead to a segmentation fault. This is a very common mistake, so keep this in mind when coding in *Idefix*. Arrays ====== The fact that most of the computations are performed on the device implies that specific allocations on the device need to be performed. To make things easier on the programmer's side, array allocations should always use one of the ``IdefixArraynD`` where *n* is between 1 and 4. These arrays are an alias for ``Kokkos::View`` which are fine-tuned for idefix, and are templated by the datatype, which for most applications should be ``real``. By definition, an ``IdefixArray`` is always allocated on the device, and is therefore not accessible from the host. To define an array on the host, one use instead ``IdefixHostArraynD``. Note that ``IdefixHostArraynD`` can be defined as mirrors of an ``IdefixArraynD``, to simplify data transfer. It should be noted that these arrays are not just a contiguous memory zone as one would expect from a C array. Instead, they are C++ objects, with several properties. In particular, pointer arithmetic, which is a common (bad) practice on C arrays, is not allowed on ``IdefixArray``. Allocation is performed as an instantiation of the ``IdefixArraynD`` class with the array name and size in parameter. For instance, the following code block will allocate myOldArray and perform a shallow copy of ``myOldArray``. .. code-block:: c++ int arraySizeX1 = 10; // 1st dimension of the arrays int arraySizeX2 = 10; // 2nd dimension of the arrays // allocation IdefixArray2D myOldrArray("ArrayName", arraySizeX1, arraySizeX2); // Shallow copy of myOldArray to myNewArray IdefixArray2D myNewArray = myOldArray; No data is copied, ``myNewArray`` is merely a new reference to the same memory block on the device. Similarly, accessing an element of an ``IdefixArraynD`` should always be done with the accessor ``(...)``. In the example above, one should use for instance ``myNewArray(1,2)`` (note the round brackets). It is possible to copy data to/from the host/device manually using ``Kokkos::deep_copy`` (see examples in ``DataBlockHost::syncToDevice()``). However, *Idefix* provides higher level functions which should be sufficient for most uses through the classes ``DataBlockHost`` and ``GridHost``. .. tip:: ``IdefixArray`` internally contains a reference counter, so that when the array is not referenced anymore, the memory is automatically freed. Hence there is no equivalent of C ``free`` for ``IdefixArray``. Execution space and loops ========================= Just like with arrays, code can be executed on the host or on the device. Unless otherwise mentionned, code is by default executed on the host. Since the device is supposed to be performing the computation itself and because it is usually coded using loops, *Idefix* provides a special way to handle loops which are to be executed on the device, with the function ``idefix_for``. ``idefix_for`` is just a way to write a for loop, with some caveats. Depending on the kind of device *Idefix* is running, the loop can be unrolled in an arbitrary order, and some iterations might be executed simultaneously. So care should be taken when writting loops. A typical loop on three indices looks like .. code-block:: c++ // Allocate an Idefix Array IdefixArray3D myArray("MyArray", nx1, nx2, nx3); idefix_for("LoopName", kbeg,kend, jbeg,jend, ibeg,ieng, KOKKOS_LAMBDA (int k, int j, int i) { myArray(k,j,i) = 0.0; }); This loop will be executed on the device, and will perform a loop on indices i,j,k ranging from ibeg to iend for i, etc. Note that as expected, we access the data stored in an ``IdefixArray`` inside an ``idefix_for``, i.e. code which is executed on the device. The string ``"LoopName"`` should be descriptive of the loop (i.e. avoid "loop1", "loop2"...). It is used when profiling or debugging the code and it names the execution kernels. Note finally that the last argument of ``idefix_for`` relies on the ``KOKKOS_LAMBDA`` construct, which implies that *Idefix* is actually making a C++ lambda when a loop is called. While this should be transparent to most users, It should be kept in mind that these lambdas capture their variables by value [=]. To avoid too much overhead, one should therefore avoid capturing complex structures. Moreover, a bug in the Nvidia Cuda compiler ``nvcc`` prevents Cuda lambdas from capturing class members (`see this post `_). While this bug is tightly linked to the C++14 norm and will be addressed in C++17, one should always make local copies of the class members before using them in loops, to keep compatibility with Cuda in C++14. .. warning:: As stated above, to avoid compatibility issues with nvcc, *always* make local copies (references) of the arrays and variables you intend to use before calling ``idefix_loop``. This ensures that these variables will be properly captured by device lambdas. It is the most common reason for GPU specific segmentation faults. .. warning:: Generally, methods that contain calls to ``idefix_loop()`` should always be declared as ``public``. This is due to a limitation of the ``nvcc`` compiler which cannot perform lambda captures from private methods. Reductions ========== The ``idefix_for`` allows one to perform loops on arrays stored on the target. However, reductions (that is, for instance, summing on all of the elements, or finding a maximum) are not possible with ``idefix_for``. When a reduction is needed, one should instead use ``idefix_reduce``. The syntax of ``idefix_reduce`` is very similar to that of ``idefix_for``, with the addition of the variable over which the reduction is performed, and the type of reduction which is expected. For instance, a sum over all of the elements would be done through: .. code-block:: c++ // Allocate an Idefix Array IdefixArray3D myArray("MyArray", nx1, nx2, nx3); // Initialize the array somehow ... real mySum = 0.; // Note that the result will be stored on the host! idefix_reduce("Sum", kbeg,kend, jbeg,jend, ibeg,ieng, KOKKOS_LAMBDA (int k, int j, int i, real &localSum) { localSum += myArray(k,j,i); }, Kokkos::Sum (mySum)); In the above example, ``localSum`` is the temporary variable *on the device* over which portions of the reduction are performed, while ``mySum`` is the final variable, *on the host* where the result is stored. It is possible to do other reduction operations like findining minimum, maximum etc (see `Kokkos custom reductions `_ for a list). For instance, the minimum value is obtained with the following code snippet: .. code-block:: c++ // Allocate an Idefix Array IdefixArray3D myArray("MyArray", nx1, nx2, nx3); // Initialize the array somehow ... real myMin = 0.; // Note that the result will be stored on the host! idefix_reduce("Minimum", kbeg,kend, jbeg,jend, ibeg,ieng, KOKKOS_LAMBDA (int k, int j, int i, real &localMin) { localMin = std::fmin(localMin, myArray(k,j,i)); }, Kokkos::Min(myMin)); Note that when running on GPU architectures, reductions are particularly inefficient operations. If possible, it is therefore recommended to avoid them as much as possible, or to group them. .. _grid: Grid ======== Idefix uses a grid which is automatically built during startup using the informations provided in the input file (see :ref:`gridSection`). While programming your own physical problem, it can be useful to have access to the informations about the grid, such as the grid coordinates, cell volume, etc. Because Idefix uses MPI domain decomposition, the grid is accessible in two classes. The *full* grid (i.e the full physical problem, omitting any domain decomposition) is stored in the :ref:`gridClass`, while the local MPI subdomain that belongs to each processor is stored in the :ref:`datablockClass` class. The indices which are defined and used throughout the code in each of these classes are presented in the scheme below (the numbers in parenthesis are the value of each variable in this particular example). .. image:: images/grid.png :alt: Idefix grid schematics .. _classes: Useful classes ============== .. _inputClass: The ``Input`` class ------------------- ``Input`` is a class which holds all of the information regarding command line and input file data. It provides accessors such as .. code-block:: c++ // Accessor to input parameters // the arguments are always: BlockName, EntryName, ParameterNumber (starting from 0) // Read a parameter of type T from the input file and throw an error if it cannot be found // T can be a std::string, a number (real, int, double, float, int64_t) or a boolean template T Get(std::string blockName, std::string paramName, int num); // Read a parameter of type T from the input file. Set it to default if it cannot be found. // T can be a std::string or a number (real, int, double, float, int64_t) or a boolean template T GetOrSet(std::string blockName, std::string paramName, int num, T default); // Check that a block/entry is present int CheckEntry(std::string, std::string); Note that ``Input`` doesn't really read the input file each time an accessor is called. Internally, ``Input`` reads everything when constructed in a C++ container with all the data coming from the command line and the input file. Hence there is no read overhead when one calls one of these accessor. For instance, considering a ``.ini`` file: .. code-block:: [MyBlock] myentry 1.0 0.0 It is possible to fetch the entry ``myentry`` using the ``Input`` accessors. Assuming an instance of ``Input`` is allocated in ``myInput``: .. code-block:: c++ real firstParameter = myInput.Get("MyBlock","myentry",0) // firstParameter=1.0 real secondParameter = myInput.Get("MyBlock","myentry",1) // secondParameter=0.0 real thirdParameter = myInput.GetOrSet("MyBlock","myentry",2, 0.0) // thirdParameter default to 0.0 If a parameter is not found, ``Get`` will log an error and exit. One can use the ``CheckEntry`` method to check if a parameter is set in the ini file before trying to access it, or use ``GetOrSet`` with a default value, as in the example above. .. tip:: Command line options are also parsed by the ``Input`` class. These options are stored in a specific block named ``CommandLine``. .. _gridClass: ``Grid`` class ------------------ ``Grid`` is essentially a datastructure which represents the full computational domain (i.e. without domain decomposition, if MPI has been enabled). Here is the full API for this class (one may refer to :ref:`grid` for a graphical representation of the grid ): .. doxygenclass:: Grid :members: .. _datablockClass: ``DataBlock`` class ----------------------- ``DataBlock`` contains all of the data structures that belong to that particular process (i.e. if MPI is enabled, it contains data specific to this subprocess, in contrast to ``Grid``). Here is the full API for the dataBlock class (one may refer to :ref:`grid` for a graphical representation of the grid ): .. doxygenclass:: DataBlock :members: .. _hydroClass: ``Hydro`` class --------------------- The ``Hydro`` class (and its sub-classes) contains all of the fields and methods specific to (magneto) hydrodynamics. While interested users may want to read in details the implementation of this class, we provide below a list of the most important members .. code-block:: c++ IdefixArray4D Vc; // Main cell-centered primitive variables index IdefixArray4D Vs; // Main face-centered varariables IdefixArray4D Uc; // Main cell-centered conservative variables // Enroll user-defined boundary conditions void EnrollUserDefBoundary(UserDefBoundaryFunc); // Enroll user-defined ohmic, ambipolar and Hall diffusivities void EnrollOhmicDiffusivity(DiffusivityFunc); void EnrollAmbipolarDiffusivity(DiffusivityFunc); void EnrollHallDiffusivity(DiffusivityFunc); // Enroll user-defined isothermal sound speed void EnrollIsoSoundSpeed(IsoSoundSpeedFunc); The first two IdefixArrays are the ones storing the primitive variable fields. These arrays are 4D, the first dimension being the field number. *Idefix* defines aliases for these numbers, so that one can call ``Vc(VX1,k,j,i)`` in place of ``Vc(1,k,j,i)`` to get the first velocity component. These aliases are defined in ``idefix.hpp`` Because the code uses contrained transport, the field defined on cell faces is stored in the ``Vs`` array. Just like for ``Vc``, there are aliases, with "s" suffixes defined to simplify the adressing of the magnetic field components, as ``Vs(BX2s,k,j,i)``. .. _datablockhostClass: ``DataBlockHost`` class --------------------------- This class is a *mirror* class, which is designed to be a (partial) copy of the ``DataBlock`` class, but in which all of the arrays are stored on the *host*. Obviously, ``DataBlockHost`` comes handy when one has to deal with input/outputs, debugging and initialisation. The ``DataBlockHost`` should always be constructed with a ``DataBlock`` in argument. This ensures that the ``DataBlockHost`` knows where its parent ``DataBlock`` is located. When created, a ``DataBlockHost`` fills the following arrays (essentially grid information) with data from its parent ``DataBlock``: .. code-block:: c++ IdefixArray1D::HostMirror x[3]; // geometrical central points IdefixArray1D::HostMirror xr[3]; // cell right interface IdefixArray1D::HostMirror xl[3]; // cell left interface IdefixArray1D::HostMirror dx[3]; // cell width IdefixArray3D::HostMirror dV; // cell volume IdefixArray3D::HostMirror A[3]; // cell right interface area Note however that the physics arrays are not automatically synchronized when ``DataBlockHost`` is created, that is: .. code-block:: c++ IdefixArray4D::HostMirror Vc; // Main cell-centered primitive variables index IdefixArray4D::HostMirror Vs; // Main face-centered primitive variables index IdefixArray4D::HostMirror J; // Current (only when haveCurrent is enabled) IdefixArray4D::HostMirror Uc; // Main cell-centered conservative variables IdefixArray3D::HostMirror InvDt; IdefixArray3D::HostMirror Ex1; // x1 electric field IdefixArray3D::HostMirror Ex2; // x2 electric field IdefixArray3D::HostMirror Ex3; // x3 electric field need to be synchronized *manually*. These IdefixArrays are all defined as ``HostMirror``, implying that they are accessible from the host only. If modifications are performed on the arrays of the parent ``DataBlock``, one can call ``DataBlockHost::SyncFromDevice()`` to refresh the host arrays, and inversely one can call ``DataBlockHost::SyncToDevice()`` to send data from ``DataBlockHost`` to the parent ``DataBlock``. Finally, ``DataBlockHost`` provides a useful method ``DataBlockHost::MakeVsFromAmag(IdefixHostArray4D &)`` which can be used to initialise the face-centered magnetic field stored in ``DataBlockHost::Vs`` from a user-defined magnetic potential. See :ref:`setupInitflow`. .. _dumpImageClass: ``DumpImage`` class ------------------- This class loads a restart dump in host memory and makes it available to the user. It is particularly useful when one wants to initialise the flow from a previous simulation using a different resolution/dimension/physics, as in such cases, *Idefix* is unable to automatically restart with the simple ``-restart`` command line option. The ``DumpImage`` class definition is .. code-block:: c++ class DumpImage { public: DumpImage(std::string, Output &); // constructor with dump filename and output object as parameters int np_int[3]; // number of points in each direction int geometry; // geometry of the dump real time; // time at which the dump was created IdefixArray1D x[3]; // geometrical central points IdefixArray1D xr[3]; // cell right interface IdefixArray1D xl[3]; // cell left interface std::map> arrays; // 3D arrays stored in the dump }; Typically, a ``DumpImage`` object is constructed invoking the ``DumpImage(filename, output)`` constructor, which essentially opens, allocate and load the dump file in memory (when running with MPI, each processor have access to the full domain covered by the dump, so try to avoid loading very large dumps!). The user can then have access to the dump content using the variable members of the object (eg ``DumpImage::arrays['variable'](k,j,i)``). Do not forget to delete the object once you have finished working with it. An example is provided in :ref:`setupInitDump`. .. warning:: ``DumpImage`` should *not* be used to restart *idefix* in the same run (use the ``-restart`` option in :ref:`commandLine` for this) .. note:: Note that the naming conven in ``DumpImage::arrays`` combine the original array and variable name. It is generically written ``XX-YYY`` where ``XX`` is the array name in the ``dataBlock`` (e.g. ``Vc`` or ``Vs``) and ``YYY`` is the variable name (e.g. ``VX2`` or ``BX3s``). .. _LookupTableClass: ``LookupTable`` class ----------------- The ``LookupTable`` class allows one to read and interpolate elements from a coma-separated value (CSV) file or a numpy file (generated from ``numpy.save`` in python). CSV constructor +++++++++++++++ ``LookupTable`` can be initialised with any CSV file with a 1D or 2D lookup table which content has the following shape: .. list-table:: example1D.csv :widths: 25 25 25 :header-rows: 0 * - x\ :sub:`1` - x\ :sub:`2` - x\ :sub:`3` * - data\ :sub:`1` - data\ :sub:`2` - data\ :sub:`3` .. list-table:: example2D.csv :widths: 25 25 25 25 :header-rows: 0 * - - x\ :sub:`1` - x\ :sub:`2` - x\ :sub:`3` * - y\ :sub:`1` - data\ :sub:`1,1` - data\ :sub:`2,1` - data\ :sub:`3,1` * - y\ :sub:`2` - data\ :sub:`1,2` - data\ :sub:`2,2` - data\ :sub:`3,2` Each element of the CSV file can be separated by an arbitrarily chosen delimiter (which can be "," ";", etc...). Such a file can be loaded using the constructor .. code-block:: c++ template LookupTable::LookupTable(std::string filename, char delimiter); // Load a CSV file Note that the number of dimensions the lookup table should expect is given as a template parameter ``nDim`` to the class ``LookupTable``. For the CSV constructor, ``nDim`` can only have the values 1 or 2. .. note:: The input CSV file is allowed to contain comments, starting with "#". Any character following "#" is ignored by the ``LookupTable`` class. Numpy constructor +++++++++++++++++ An instance of ``LookupTable`` can also be initialised from numpy arrays with an arbitrary number of dimensions ``nDim``. In this case, the constructor expects a vector of size ``nDim`` of .npy files for the 1D coordinates of the lookup table, and a single file containing the ``nDim`` dimensions of the lookup table. The constructor is defined as .. code-block:: c++ template LookupTable::LookupTable(std::vector coordinates, std::string dataSet); Note that the template parameter ``nDim`` should match the number of dimensions of the numpy array stored in the file ``dataSet``. Using the lookup table ++++++++++++++++++++++ Once an instance of ``LookupTable`` has been created from a CSV or a Numpy file, it can be used using the ``Get`` method inside an ``idefix_for`` loop (i.e. from the device), or with the ``GetHost`` method when calling outside of a ``idefix_for`` (i.e. from the host). The ``Get`` and ``GetHost`` functions expect a C array of size ``nDim`` and returns the multi-linear interpolation from the lookup table. For instance: .. code-block:: c++ #include "lookupTable.hpp" // Load a 2D CSV lookup table LookupTable<2> csv("example2D.csv",','); // Use the lookup table in an idefix_for loop (Device) idefix_for("loop",0, 10, KOKKOS_LAMBDA (int i) { real x[2]; x[0] = 2.1; x[1] = 3.5; arr(i) = csv.Get(x); }); // Use the lookup table outside of an idefix_loop (Host) real y[2]; y[0] = 3.0] y[1] = -1.0; real result = csv.GetHost(y); .. note:: Usage examples are provided in `test/utils/lookupTable`. Debugging and profiling ======================= The easiest way to trigger debugging in ``Idefix`` is to switch on ``Idefix_DEBUG`` in cmake (for instance adding ``-DIdefix_DEBUG=ON`` when calling cmake). This forces the code to log each time a function is called or returned (this is achieved thanks to the ``idfx::pushRegion(std::string)`` and ``idfx::popRegion()`` which are found at the beginning and end of each function). In addition, ``Idefix_DEBUG`` enables Kokkos array bound checks, which will throw an error each time one tries to access an array out of its allocated memory space. Note that all of these debugging features induce a large overhead, and should therefore not be activated in production runs. It is also possible to use `Kokkos-tools `_ to debug and profile the code. For instance, on the fly profiling, can be enabled with the Kokkos ``space-time-stack`` module. To use it, simply clone ``Kokkos-tools`` to the directory of your choice (say ``$KOKKOS_TOOLS``), then ``cd`` to ``$KOKKOS_TOOLS/profiling/space-time-stack`` and compile the module with ``make``. Once the profiling module is compiled, you can use it by setting the environement variable ``KOKKOS_PROFILE_LIBRARY``. For instance, in bash: .. code-block:: bash export KOKKOS_PROFILE_LIBRARY=$KOKKOS_TOOLS/profiling/space-time-stack/kp_space_time_stack.so Once this environement variable is set, *Idefix* automatically logs profiling informations when it ends (recompilation of *Idefix* is not needed). .. tip:: By default, ``Kokkos-tools`` assumes the user code is using MPI. If one wants to perform profiling in serial, one should disable MPI before compling the ``space-time-stack`` module. This is done by editing the makefile in ``$KOKKOS_TOOLS/profiling/space-time-stack`` changing the compiler ``CXX`` to a valid serial compiler, and adding ``-DUSE_MPI=0`` to ``CFLAGS``. Minimal skeleton ================ Because it is sometimes useful to do experiments with a very simple code without the burden of the full initialisation, experiment with ``Idefix`` basic structures (idefix_for, idefix arrays, idefix objects) can be done using a minimal skeleton, located in ``$IDEFIX_DIR/test/skeleton``. The file ``main.cpp`` can be filled with any experimental code and replaced *Idefix* standard main file. It should then be configured using cmake like any other *Idefix* problem ``cmake $IDEFIX_DIR`` and compiled with ``make``. In the example provided, the skeleton performs a simple sum on an idefix array and compares it to the same reduction on the host.