.. spelling:word-list:: Avgpool Conv dequant quant rsqrt th .. include:: /keyword.rst ============ Neuron Tools ============ .. contents:: Sections :local: :depth: 2 In this section, each Neuron tool is described with its command-line options. Neuron tools can be invoked directly from the command line, or from inside a C/C++ program using the Neuron API. For details on the Neuron API, see :ref:`Neuron API Reference `. .. _ml_neuron-compiler: Neuron Compiler (``ncc-tflite``) -------------------------------- ``ncc-tflite`` is a compiler tool used to generate a statically compiled network (`.dla` file) from a TFLite model. ``ncc-tflite`` supports the following two modes: * **Compilation mode**: ``ncc-tflite`` generates a compiled binary (`.dla`) file from a TFLite model. Users can use the runtime tool (:command:`neuronrt`) to execute the `.dla` file on a device. * **Execution mode**: ``ncc-tflite`` compiles the TFLite model into a binary and then executes it directly on the device. Use ``-e`` to enable execution mode and ``-i`` ``-o`` to specify the input and output files. Usage ^^^^^ Basic commands for using ``ncc-tflite`` to convert TFLite model to DLA file that can be inference on the APU: .. prompt:: bash # auto # ncc-tflite -arch mdla2.0,vpu /usr/share/benchmark_dla/ssd_mobilenet_v1_coco_quantized.tflite -o /usr/share/benchmark_dla/ssd_mobilenet_v1_coco_quantized.dla All options of ``ncc-tflite``: .. code-block:: none Usage: ncc-tflite [OPTION...] filename --verify Force tflite model verification --no-verify Bypass tflite model verification -d, --dla-file Specify a filename for the output DLA file --check-target-only Check target support and exit --resize Specify a list of input dimensions for resizing (e.g., 1x3x5,2x4x6) -s, --show-tflite Show tensors and nodes in the tflite model --show-io-info Show input and output tensors of the tflite model --show-builtin-ops Show available builtin operations and exit --show-mtkext-ops Show available MTKEXT operations and exit --verbose Enable verbose mode --version Output version information and exit --help Display this help and exit -e, --exec Enable execution (inference) mode -i, --input Specify a list of input files for inference -o, --output Specify a list of output files for inference --arch Specify a list of target architecture names --platform Platform preference as hint for compilation -O, --opt Specify which optimization level to use: [0]: no optimization [1]: enable basic optimization for fast codegen [2]: enable most optimizations [3]: enable -O2 with other optimizations that take longer compilation time (default: 2) --opt-accuracy Optimize for accuracy --opt-aggressive Enable optimizations that may lose accuracy --opt-bw Optimize for memory bandwidth --opt-footprint Optimize for memory footprint --opt-size Optimize for size, including code and static data --relax-fp32 Run fp32 models using fp16 --l1-size-kb Hint the size of L1 memory (default: 0) --l2-size-kb Hint the size of L2 memory (default: 0) --suppress-input Suppress input data conversion --suppress-output Suppress output data conversion --gen-debug-info Produce debugging information in the DLA file. Runtime can work with this info for profiling --show-exec-plan Show execution plan --show-memory-summary Show memory allocation summary --dla-metadata Specify a list of key:file pairs for DLA metadata --disallow-bridge Report error if bridging is needed --avoid-reorder Keep execution order during graph optimization if possible --extract-static-data Extract static parameters into file and make them as input tensors --intval-color-fast Disable exhaustive search in interval coloring --show-l1-req Show the requirement for L1 without dropping. Only effective when global buffer allocation is in effect --int8-to-uint8 Convert data types from INT8 to UINT8 --fc-to-conv Convert Fully Connected to Conv2D --decompose-qlstmv2 Decompose QLSTM V2 to sub-OPs --stable-linearize Stable linearize NIR (respect the input NIR order), making layer order predictable --rewrite-pattern Specify a list of patterns to be rewritten if matched in a graph. Use --rewrite-pattern=? to show available patterns --sink-concat Sink concat operations if possible --reshape-to-4d Reshape tensor to 4D if possible aps options: --aps-cbfc-vids Provide idle CBFC vids for APS internal use. (e.g., 0,1) --aps-ext-datatype Enable more datatype support for extension. gno options: --gno Specify a list of graphite neuron optimizations. Available options: NDF, SMP, BMP --basic-tiling Enable basic tiling gpu options: --cltuner-file An output file path for CL tuner that generates optimization settings (default: /vendor/etc/armnn_app.config) --cltuning-mode Set the tuning level of CL tuner (default: -1) --cmdl-dir An output directory for CmdL that dumps infos --clprofile Enable CmdL clprofile --clfinish Enable CmdL clfinish mdla options: --num-mdla Use numbers of MDLA cores (default: 1) --mdla-bw Hint MDLA bandwidth (MB/s) (default: 10240) --mdla-freq Hint MDLA frequency (MHz) (default: 960) --mdla-wt-to-l1 Hint MDLA try to put weight into L1 --mdla-wt-pruned The weight of given model has been pruned --prefer-large-acc Use large accumulator to improve accuracy --use-sw-dilated-conv Use software dilated convolution --use-sw-deconv Convert DeConvolution to Conv2Ds --req-per-ch-conv Requant invalid per-channel convs --trim-io-alignment Trim the model IO alignment vpu options: --dual-vpu Use dual VPU .. _ml_neuron-general-options: General Options ^^^^^^^^^^^^^^^ ``--exec / --input / --output `` Enable execution mode and specify input and output files. ``--arch `` Specify a list of targets which the model is compiled for. ``--platform `` Hint platform preference for compilation. ``--opt `` Specify which optimization level to use. .. code-block:: -O0: No optimization -O1: Enable basic optimization for fast codegen -O2: Enable most optimizations (default) -O3: Enable -O2 with other optimizations that increase compilation time ``--opt-accuracy`` Optimize for accuracy. This option tries to make the inference results similar to the results from the CPU. It may also cause performance drops. .. list-table:: :widths: 15 35 :header-rows: 1 * - Layer - Description * - RSqrtLayer - If datatype is int16, convert to float16 (dequant -> rsqrt -> quant). * - | AvgPool2DLayer - Increase the cascade depth of Avgpool to improve accuracy. * - | Conv2DLayer | DepthwiseConv2DLayer | FullyConnectedLayer | GroupConv2DLayer | TransposeConv2Dlayer - Set the bias of Conv2D to zero and add an additional ChannelWiseAdd layer to improve accuracy if the following conditions are true: * Output datatype is not Asymmetric * Input is floating-point * Filter is quantized. ``--opt-aggressive`` Enable optimizations that may lose accuracy. .. list-table:: :widths: 15 35 :header-rows: 1 * - Layer - Description * - QuantizeLayer + DequantizeLayer - Simplify to IdentityLayer. * - SoftmaxLayer - Adjust legalized op order to reduce inference time. ``--opt-bw`` Optimize for bandwidth. Enable NDF agent (``--gno=NDF``) if ``--gno`` is not specified. ``--opt-footprint`` Optimize for memory footprint. This option also disables some optimizations that improve inference time but lead to a larger memory footprint. ``--opt-size`` Optimize for size, including code and static data. This option also disables some optimizations that may increase code or data size. ``--intval-color-fast`` Disable exhaustive search in interval coloring to speed up compilation. This option is automatically turned on in -O2 or lower optimization level. This option can be used with -O3. ``--dla-file `` Specify a filename for the output DLA file. ``--disallow-bridge`` Report error if bridging is needed. Useful to detect unaligned data type or data pitch across subgraph border at early stage. ``--avoid-reorder`` Keep execution order during graph optimization, if possible. This option disables some optimizations that may change the order of operation execution. ``--relax-fp32`` Hint the compiler to compute FP32 models using FP16 precision. ``--decompose-qlstmv2`` Hint the compiler to decompose QLSTM V2 to multiple operations. ``--check-target-only`` Check target support without compiling. Each OP is checked against the target list. If any target does not support the OP, a message is displayed. For example, we use ``--arch=mdla1.5,vpu`` and ``--check-target-only`` for SOFTMAX: .. code-block:: OP[0]: SOFTMAX ├ MDLA: SoftMax is supported for MDLA 2.0 or newer. ├ VPU: unsupported data type: Float32 ``--resize`` Resize the inputs using the given new dimensions and run shape derivations throughout the model. This is useful for changing the dimensions of IO and the internal tensors of the model. Note that during shape derivations, the original attributes of each layer are not modified. Instead, the attributes might be read and then used to derive the new dimensions of the layer's output tensors. ``--int8-to-uint8`` Convert data types from INT8 to UINT8. This option is required to run asymmetric signed 8-bit model on hardware that does not support INT8 (e.g., MDLA 1.0 and MDLA 1.5). ``--sink-concat`` Sink ConcatLayer, ReshapeLayer, TransposeLayer, (DepthToSpaceLayer only on MDLA 3.0) when the below op is one of the following layers: * SingleOperandElementWise * AbsLayer, CeilLayer, ExpLayer, FloorLayer, LogLayer, NegLayer, RecipLayer, RoundLayer, RSqrtLayer, SqrtLayer, SquareLayer * ElementWiseBase when broadcast is possible * ElementWiseAddLayer, ElementWiseDivLayer, ElementWiseMaxLayer, ElementWiseMinLayer, ElementWiseMulLayer, ElementWiseRSubLayer, ElementWiseSubLayer, SquaredDifferenceLayer * ChannelWiseBase when sinkable op is the first input and the second input size is 1 * ChannelWiseAddLayer, ChannelWiseMaxLayer, ChannelWiseMinLayer, ChannelWiseMulLayer, ChannelWiseRSubLayer, ChannelWiseSubLayer * ActivationBase * ClipLayer, HardSwishLayer, LeakyReluLayer, PReluLayer, ReLULayer, ReLU1Layer, ReLU6Layer, SigmoidLayer, TanhLayer * CastLayer * RequantizeLayer, QuantizeLayer, DequantizeLayer when there is no per-channel Quant ``--l1-size-kb`` Provide the compiler with L1 memory size. This value should not be larger than that of real platform. ``--l2-size-kb`` Provide the compiler with L2 memory size. This value should not be larger than that of real platform. ``--suppress-input`` Hint the compiler to suppress the input data conversion. Users have to pre-convert the input data into platform-compatible format before inference. ``--suppress-output`` Hint the compiler to suppress the output data conversion. Users have to convert the output data from platform-generated format before inference. ``--extract-static-data `` Extract static parameters into a separate data file. If two or more DLA files have the same static parameters, they can share the same data file instead of storing duplicate static parameters in each DLA file. ``--gen-debug-info`` Generate operation and location info in the DLA file, for per-op profiling. ``--show-tflite`` Show tensors and nodes in the TFLite model. For example: .. code-block:: Tensors: [0]: MobilenetV2/Conv/Conv2D_Fold_bias ├ Type: kTfLiteInt32 ├ AllocType: kTfLiteMmapRo ├ Shape: {32} ├ Scale: 0.000265382 ├ ZeroPoint: 0 └ Bytes: 128 [1]: MobilenetV2/Conv/Relu6 ├ Type: kTfLiteUInt8 ├ AllocType: kTfLiteArenaRw ├ Shape: {1,112,112,32} ├ Scale: 0.0235285 ├ ZeroPoint: 0 └ Bytes: 401408 [2]: MobilenetV2/Conv/weights_quant/FakeQuantWithMinMaxVars ├ Type: kTfLiteUInt8 ├ AllocType: kTfLiteMmapRo ├ Shape: {32,3,3,3} ├ Scale: 0.0339689 ├ ZeroPoint: 122 └ Bytes: 864 ... ``--show-io-info`` Show input and output tensors of the TFLite model. For example: .. code-block:: # of input tensors: 1 [0]: input ├ Type: kTfLiteUInt8 ├ AllocType: kTfLiteArenaRw ├ Shape: {1,299,299,3} ├ Scale: 0.00784314 ├ ZeroPoint: 128 └ Bytes: 268203 # of output tensors: 1 [0]: InceptionV3/Logits/Conv2d_1c_1x1/BiasAdd ├ Type: kTfLiteUInt8 ├ AllocType: kTfLiteArenaRw ├ Shape: {1,1,1,1001} ├ Scale: 0.0392157 ├ ZeroPoint: 128 └ Bytes: 1001 ``--show-l1-req`` Show the minimum amount of L1 memory required to save all memory objects. This just shows the information and does not affect compilation. This option is effective only when global buffer allocation is active. ``--show-exec-plan`` :command:`ncc-tflite` supports heterogeneous compilation, it partitions the network automatically based on the ``--arch`` options provided and dispatches sub-graph to their corresponding supported targets. Use this option to check the execution plan table. For example: .. code-block:: ExecutionStep[0] ├ StepId: 0 ├ Target: MDLA_1_5 └ Subgraph: ├ Conv2DLayer<0> ├ DepthwiseConv2DLayer<1> ├ Conv2DLayer<2> ├ Conv2DLayer<3> ├ DepthwiseConv2DLayer<4> ... ├ Conv2DLayer<61> ├ PoolingLayer<62> ├ Conv2DLayer<63> ├ ReshapeLayer<64> └ Output: OpResult (external) ``--show-memory-summary`` Estimate the memory footprint of the given network. The following is an example of DRAM/L1 (APU L1 memory)/L2 (APU L2 memory) breakdown. Each cell consists of two integers: X(Y). X is the physical buffer size of this entry. Y is the total size of tensors of this entry. Note that X <= Y since the same buffer may be reused for multiple tensors. Input/Output corresponds to the buffer size used for the network's I/O activation. Temporary corresponds to the working buffer size of the network's intermediate tensors (:command:`ncc-tflite` analysis the graph dependencies and tries to minimize buffer usage). Static corresponds to the buffer size for the network's weight. .. code-block:: Planning memory according to the following settings: L1 Size(bytes) = 0 L2 Size(bytes) = 0 Buffer allocation summary: \ Unknown L1 L2 DRAM Input 0(0) 0(0) 0(0) 200704(200704) Output 0(0) 0(0) 0(0) 1008(1008) Temporary 0(0) 0(0) 0(0) 1505280(81341200) Static 0(0) 0(0) 0(0) 3585076(3585076) ``--dla-metadata `` Specify a list of key:file pairs as DLA metadata. Use this option to add additional information to a DLA file, such as the model name or quantization parameters. Applications can read the metadata using the :ref:`RuntimeAPI.h ` functions ``NeuronRuntime_getMetadataInfo`` and ``NeuronRuntime_getMetadata``. Note that adding metadata does not affect inference time. Example: Adding metadata to a DLA file .. code-block:: $ ./ncc-tflite model.tflite -o model.dla --arch=mdla3.0 --dla-metadata quant:./quant1.bin, other:./misc.bin Example: Reading metadata from a DLA file .. code-block:: // Get the size of the metadata size_t metaSize = 0; NeuronRuntime_getMetadataInfo(runtime, "quant", &metaSize); // Metadata in dla is copied to 'data' char* data = static_cast(malloc(sizeof(char) * metaSize)); NeuronRuntime_getMetadata(runtime, "quant", data, metaSize); ``--show-builtin-ops`` Show built-in operations supported by :command:`ncc-tflite`. ``--no-verify`` Bypass TFLite model verification. Use this option when the given TFLite model cannot be run by the TFLite interpreter. ``--verbose`` Enable verbose mode. Detailed progress is shown during compilation. ``--version`` Print version information. GNO Options ^^^^^^^^^^^ ``--gno `` Available graphite neuron optimizations: [NDF, SMP, BMP] * **NDF**: Enables Network Deep Fusion transformation. This is an optimization strategy for reducing DRAM access. * **SMP**: Enables Symmetric Multiprocessing transformation. This is an optimization strategy for executing the network in parallel on multiple DLA cores. The aim is to make graphs utilize the computation power of multiple cores more efficiently. * **BMP**: Enables Batch multiprocess transformation. This is an optimization strategy for executing each batch dimension of the network in parallel on multiple MDLA cores. The aim is to make graphs with multiple batches utilize the computation power of multiple cores more efficiently. MDLA Options ^^^^^^^^^^^^ ``--num-mdla `` Hint the compiler to use ```` MDLA cores. With a multi-core platform, the compiler tries to generate commands for parallel execution. ``--mdla-bw `` Provide the compiler with MDLA bandwidth. ``--mdla-freq `` Provide the compiler with MDLA frequency. ``--mdla-wt-pruned`` Hint the compiler that the weight of a given model has been pruned. ``--mdla-wt-to-l1`` Hint the MDLA to try to put weight into L1 memory. ``--prefer-large-acc `` Hint the compiler to use a larger accumulator for improving accuracy. A higher value allows larger integer summation or multiplication, but a smaller value is ignored. Do not use this option if most of the results of summation or multiplication are smaller than 2^32. ``--fc-to-conv`` Hint the compiler to convert Fully Connected (FC) to Conv2D. ``--use-sw-dilated-conv`` Hint the compiler to use multiple non-dilated convolution to simulate a dilated convolution. This option works only when dilation rate is a multiple of stride. This option increases the utilization rate of hardware with less internal buffer and allows dilation rates beside {1, 2, 4, 8}. ``--use-sw-deconv`` Hint the compiler to convert deconvolution to Conv2Ds. This option increases the utilization rate of hardware but also the memory footprint. ``--req-per-ch-conv`` Hint the compiler to re-quantize the per-channel quantized convolutions if they have unsupported scales of outputs. Enabling this option might reduce accuracy, because the re-quantization chooses the maximal scale of ``input_scale * filter_scale`` as the new output scale. ``--trim-io-alignment`` Hint the compiler to perform operations that could potentially reduce required padding for inputs and outputs of the given network. NOTE: Enabling this option might introduce additional computation. Option Effects -------------- .. list-table:: :widths: 25 10 10 10 :header-rows: 1 * - Option - Accuracy - Inference Time - Memory Footprint * - ``--opt-accuracy`` - Might Increase - Might Increase - * - ``--opt-aggressive`` - Might Decrease - Might Decrease - * - ``--opt-bw`` - - Decrease - Decrease * - ``--sink-concat`` - - Might Decrease - * - ``--fc-to-conv`` - - Might Decrease - Might Decrease Compile Option Examples ----------------------- For beginners, we recommend that users follow the flow chart below to optimize their model. .. figure:: /_asset/sw_rity_ml-guide_neuron_sdk_compile_flow_chart.svg The following table contains recommended compilation options for common scenarios. Users must adjust the number of MDLA processors and L1 size based on their target device. .. list-table:: :widths: 10 20 20 :header-rows: 1 * - Scenario - Options - Description * - Default - ``--opt 3 --mdla-num 4 --l1-size-kb=6144`` - * - AINR - ``--opt-bw --mdla-num 4 --l1-size-kb 6144 --opt-accuracy`` - * AINR has high resolutions in dimensions. To reduce bandwidth and footprint, use ``--opt-bw``. * To increase the accuracy of specific operations, use ``--opt-accuracy``. For details. see ::ref:`General Options`. * - Capture - ``--opt 3 --mdla-num 4 --l1-size-kb 6144 --opt-accuracy`` - * Optimize utilization of multi-cores. * To increase the accuracy of specific operations, use ``--opt-accuracy``. For details. see ::ref:`General Options`. * - NLP / ASR - ``--opt-bw --fc-to-conv --mdla-num 4 --l1-size-kb 6144 --opt-accuracy --decompose-qlstmv2`` - * To reduce bandwidth and footprint, use ``--opt-bw``. * To increase the accuracy of specific operations, use ``--opt-accuracy``. For details. see ::ref:`General Options`. * Convert FC to Conv2D to increase fusion opportunity and reduce footprint using ``--fc-to-conv`` if a BMM structure exists. * Split quantized LSTM to be supported by MDLA using ``--decompose-qlstmv2``. .. _ml_neuron-runtime: Neuron Runtime (``neuronrt``) ----------------------------- ``neuronrt`` invokes the Neuron runtime, which can execute statically compiled networks (`.dla` files). ``neuronrt`` allows users to perform on-device inference. Usage ^^^^^ Basic commands for using ``neuronrt`` to load DLA file and inference. Example: single input/output: .. prompt:: bash # auto # neuronrt -m hw \ -a /usr/share/benchmark_dla/mobilenet_v2_1.0_224_quant.dla \ -i input.bin \ -o output.bin Example: multiple inputs/outputs: .. prompt:: bash # auto # # use "neuronrt -d to show the index of input/output id. # # use "-i" or "-o" to specify the input/output files in order. # # neuronrt -m hw \ -a /usr/share/benchmark_dla/ssd_mobilenet_v1_coco_quantized.dla \ -i input.bin \ -o output_0.bin \ -o output_1.bin \ -o output_2.bin \ -o output_3.bin \ -o output_4.bin \ -o output_5.bin \ -o output_6.bin \ -o output_7.bin \ -o output_8.bin \ -o output_9.bin \ -o output_10.bin \ -o output_11.bin All options of ``neuronrt``: .. code-block:: Usage: neuronrt [OPTION...] common options: -m Specify which device will be used to execute the DLA file. can be: null/cmodel/hw, default is null. If 'cmodel' is chosen, users need to further set CModel library in env. -a Specify the ahead-of-time compiled network (.dla file) -d Show I/O id-shape mapping table. -i Specify an input bin file. If there are multiple inputs, specify them one-by-one in order, like -i input0.bin -i input1.bin. -o Specify an output bin file. If there are multiple outputs, specify them one-by-one in order, like -o output0.bin -o output1.bin. -u Use recurrent execution mode. -c Repeat the inference times. It can be used for profiling. -b Specify the boost value for Quality of Service. Range is 0 to 100. -p Specify the priority for Quality of Service. The available arguments are 'urgent', 'normal', and 'low'. -r Specify the execution preference for Quality of Service. The available arguments are 'performance', and 'power'. -t Specify the deadline for Quality of Service in ms. Suggested value: 1000/FPS. -e ** This option takes no effect in Neuron 5.0. The parallelism is fully controlled by compiler-time option. To be removed in Neuron 6.0. ** Specify the strategy to execute commands on the MDLA cores. The available arguments are 'auto', 'single', and 'dual'. Default is auto. If 'auto' is chosen, scheduler decides the execution strategy. If 'single' is chosen, all commands are forced to execute on single MDLA. If 'dual' is chosen, commands are forced to execute on dual MDLA. -v Show the version of Neuron Runtime library --input-shapes Specify a list of input dimensions (N-Dims). If there are multiple inputs, specify them one-by-one in order, like 1x1080x1920x3,1x1080x1920x1. --output-shapes Specify a list of output dimensions (N-Dims). If there are multiple outputs, specify them one-by-one in order, like 1x360x640x3,1x360x640x1. --Xruntime Pass options to the neuron runtime. Enclose option string by single quotation. debug options: -s Use symmetric 8-bit mode. I/O ID-Shape Mapping Table ^^^^^^^^^^^^^^^^^^^^^^^^^^ If the ``-d`` option is specified, :command:`neuronrt` will show I/O information of the `.dla` file specified by the ``-a`` option. Example output: .. code-block:: text Input : Handle = 1, <1 x 128 x 64 x 3>, size = 98304 bytes Handle = 0, <1 x 128 x 128 x 3>, size = 196608 bytes Output : Handle = 0, <1 x 128 x 192 x 5>, size = 491520 bytes The row with *Handle = * provides the I/O information for the *N*-th Input/Output in the compiled network. Let's analyze the I/O information of the input tensor in the second row of the example: .. code-block:: text Handle = 1, <1 x 128 x 64 x 3>, size = 98304 bytes The input tensor with handle=1 is the second input in the compiled network, and has shape <1 x 128 x 64 x 3> with a total data size of 98304 bytes. The example is a float32 network, therefore data size is calculated using the following method: (1 x 128 x 64 x 3) x 4 (*4 bytes for float32*) = 98304 .. DISABLE DLA-MUXER PART: .. _ml_neuron-dla-packer: DLA Packer (``dla-packer``) --------------------------- ``dla-packer`` is a tool for packing multiple compiled networks (.dla files) into a single deep learning bundle (.dlb file). ``dla-packer`` also provides support for cross-DLA cooperation. * For information about using DLB files, see :doc:`Neuron DLA Muxer API ` * For an example of using ``dla-packer``, see :ref:`Case Study: Multiple Resolutions `. Usage ^^^^^ .. code-block:: Pack DLA files into a bundled .dlb file. Usage: dla-packer [OPTION...] network1.dla, network2.dla, ... --share-tmp-buf Share DRAM temporary buffers between DLAs in runtime. --ext-static-data Specify the bundled bin file for external static data. -l, --compression-level Specify the compression level (0~9) for all the bundled files.(0: NoCompression, 9: BestCompression) (default: 0) -o, --output dlb Specify the output path of packed DLAs (.dlb) file --help Display this help and exit DLB Reader (``readdlb``) ------------------------ ``readdlb`` is a tool for reading a deep learning bundle file (.dlb) and dumping detailed information. Usage ^^^^^ .. code-block:: Read information of a dlb file. Usage: readdlb [OPTION...] dlb --share-tmp-buf-alignment Specify the alignment value for calculating required buffer size (unit: byte). (default: 4096) --help Display this help and exit