Neuron API Reference

Fence.h

struct FenceInfo

#include <Fence.h>

This struct is used to receive the fence file descriptor and the post-inference callback in fenced execution. Specifically, user should allocate this struct, and pass its address into fenced execution API. The fence FD and the call back will be set properly. After fence is triggered, caller can invoke the callback to retrieve execution status and execution time.

Note

This struct is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Public Members

int64_t inputFenceFd: The file descriptor of the fence to be triggered before inference. Use -1 for this field if there is no inputFenceFd in the inference.

int64_t fenceFd: The file descriptor of the fence to be triggered at the end of inference.

void (*callback)(void *opaque): Caller should call this callback after fence is triggered to retrieve execution status and time. Caller should send back the address of the original FenceInfo which possesses this callback in the first parameter ‘opaque’.

uint32_t status: Execution status. This will be set after callback is called.

uint32_t microseconds: Execution time. This will be set after callback is called.

uint64_t __internal__[4]: The following data are for internal use. Don’t access them.

file Fence.h

#include <stdint.h>

#include <sys/cdefs.h>

Functions

int NeuronRuntime_isFenceSupported(void *runtime, uint8_t *supported)

Check if the model supports fenced execution. Call this function after runtime is loaded with model.

Note

This function is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Parameters:

runtime – The address of the created neuron runtime instance.
supported – Non-zero value indicates that the model supports fenced execution.

Returns:

An error code indicates whether the test model executes successfully.

int NeuronRuntime_inferenceFenced(void *runtime, FenceInfo *fenceInfo)

Do fenced-inference. The call should return without waiting for inference to finish. The caller should prepare a FenceInfo structure and pass its address into this API. FenceFd in FenceInfo will be set, and the caller can be signaled when inference completes (or error exit) by waiting on the fence. Most importantly, after the fence is triggered, caller MUST call the callback in fenceInfo so that Neuron can perform certain post-execution tasks. The final execution status and inference time can be retrieved in FenceInfo after the callback is executed.

Note

This function is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Parameters:

runtime – The address of the created neuron runtime instance.
fenceInfo – The struct is used to receive the fence file descriptor and the post-inference callback in fenced execution.

Returns:

A Runtime error code.

Misc.h

file Misc.h

#include <sys/cdefs.h>

#include “Types.h”

Misc Neuron Runtime API

Miscellaneous functionality

Functions

int NeuronRuntime_getVersion(NeuronVersion *version)

Get the version of Neuron runtime library.

Note

Neuron runtime can only load DLA files generated by compiler with the same major version.

Parameters:: version – the version of Neuron runtime library.
Returns:: A RuntimeAPI error code.

RuntimeAPI.h

Note

This file provides backwards compatibility with NeuroPilot 4.x.

struct BufferAttribute

#include <RuntimeAPI.h>

BufferAttribute is used to inform the runtime whether this buffer is an ION buffer. If ionFd is -1, the buffer is a non-ION buffer. Otherwise, the buffer is an ION buffer and ionFd is its shared ION buffer file descriptor. Android device implementations may benefit from this information to eliminate unnecessary data copy.

Public Members

int ionFd: -1: Non-ION buffer.

struct EnvOptions

Public Members

uint32_t deviceKind

Device kind can be chosen from kEnvOptNullDevice, kEnvOptCModelDevice, or kEnvOptHardware.

For hardware development, use kEnvOptHardware.

MDLACoreMode MDLACoreOption: Set MDLA core option.

Warning

This option is no longer effective. To be removed in Neuron 6.0

uint8_t CPUThreadNum: Hint CPU backends to use #threads for execution.

bool suppressInputConversion: Set this to true to bypass preprocess and feed data in the format that the device demands.

bool suppressOutputConversion: Set this to true to bypass postprocess and retrieve raw device output.

file RuntimeAPI.h

#include “neuron/api/Types.h”

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

Neuron Runtime API

Neuron provides some APIs to create runtime environment, parse compiled model file, and do inference with a network.

The Runtime user should include this header to use Runtime API. Note that some APIs that set input and output info need the user to specify the handle of the input/output tensor that he/she wants to set.

The user may

1) Acts as ANN or TFLite, which always know the handle

2) Run a precompiled network. The user should understand the model in the beginning.

3) Run a precompiled network without knowing what the network look like. In this case, it is impossible for the user to do inference without taking a glance at the network IO map info.

Otherwise, the user cannot even give a valid input with valid input shape. After the user checks the IO map, they would also acquire the handle and the corresponding shape.

Defines

NON_ION_FD

Enums

enum MDLACoreMode

This option controls if the underlying hardware should split and run a graph across homogeneous devices. Note that this does not control the heterogeneous parallelism in the Runtime software.

Warning

This option is to be deprecated in Neuron 6.0

Values:

enumerator Auto: Scheduler decide.

enumerator Single: Force single MDLA.

enumerator Dual: Force multi MDLA.

Functions

inline int IsNullDevice(const EnvOptions *options)

Parameters:: options – The environment options for the Neuron Runtime.
Returns:: 1 to indicate user-specified EnvOptions use a NullDevice. Otherwise, return 0.

inline int IsHardware(const EnvOptions *options)

Parameters:: options – The environment options for the Neuron Runtime.
Returns:: 1 to indicate user-specified EnvOptions use real hardware. Otherwise, return 0.

int NeuronRuntime_create(const EnvOptions *optionsToDeprecate, void **runtime)

Create a Neuron Runtime based on the setting specified in options. The address of the created instance will be passed back in *runtime.

Parameters:

optionsToDeprecate – The environment options for the Neuron Runtime (To be deprecated).
runtime – Runtime provides API for applications to run a compiled network on specified input.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_create_with_options(const char *options, const EnvOptions *optionsToDeprecate, void **runtime)

Create a Neuron Runtime based on the setting specified in options. The address of the created instance will be passed back in *runtime.

Parameters:

options – The environment options for the Neuron Runtime.
optionsToDeprecate – The environment options for the Neuron Runtime (To be deprecated).
runtime – Runtime provides API for applications to run a compiled network on specified input.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_loadNetworkFromFile(void *runtime, const char *pathToDlaFile)

Load the compiled network from dla file.

Parameters:

runtime – The address of the created neuron runtime instance.
pathToDlaFile – The dla file path.

Returns:

A RuntimeAPI error code. 0 indicating load network successfully.

int NeuronRuntime_loadNetworkFromBuffer(void *runtime, const void *buffer, size_t size)

Load the compiled network from a memory buffer.

Parameters:

runtime – The address of the created neuron runtime instance.
buffer – The memory buffer.
size – The size of the buffer.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setInput(void *runtime, uint64_t handle, const void *buffer, size_t length, BufferAttribute attribute)

Set the memory buffer for the tensor which hold the specified input handle in the original network. If there are multiple inputs, each of them have to be set.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
buffer – The input buffer.
length – The input buffer size.
attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOffsetedInput(void *runtime, uint64_t handle, const void *buffer, size_t length, BufferAttribute attribute, size_t offset)

Set the memory buffer and offset for the tensor which hold the specified input handle in the original network. If there are multiple inputs, each of them have to be set.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
buffer – The input buffer.
length – The input buffer size. This length doesn’t include offset.
attribute – The buffer attribute for setting ION.
offset – The offset for ION buffer.
offset – Reading ION buffer from start addr + offset.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setSingleInput(void *runtime, const void *buffer, size_t length, BufferAttribute attribute)

If there is only one input, this function can set the buffer to the input automatically. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
buffer – The input buffer.
length – The input buffer size.
attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setInputShape(void *runtime, uint64_t handle, uint32_t *dims, uint32_t rank)

Set shape for the input tensor which hold the specified input handle in the original network. If there are multiple inputs with dynamic shapes, each of them have to be set. This API is only used when input is dynamic shape, otherwise error code will be returned.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – A array of dimension sizes for each dimension. For NHWC, dims[0] is N.
rank – The input rank. For exmaple, rank is 4 for NHWC.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOutput(void *runtime, uint64_t handle, void *buffer, size_t length, BufferAttribute attribute)

Set the memory buffer for the tensor which hold the specified output handle in the original network. If there are multiple outputs, each of them have to be set.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
buffer – The output buffer.
length – The output buffer size.
attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOffsetedOutput(void *runtime, uint64_t handle, void *buffer, size_t length, BufferAttribute attribute, size_t offset)

Set the memory buffer and offset for the tensor which hold the specified output handle in the original network. If there are multiple outputs, each of them have to be set.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
buffer – The output buffer.
length – The output buffer size. This length doesn’t include offset.
attribute – The buffer attribute for setting ION.
offset – Writing ION buffer from start addr + offset.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setSingleOutput(void *runtime, void *buffer, size_t length, BufferAttribute attribute)

If there is only one output, this function can set the buffer to the output automatically. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
buffer – The output buffer.
length – The output buffer size.
attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setQoSOption(void *runtime, const QoSOptions *qosOption)

Set the QoS configuration for Neuron Runtime. If qosOption.profiledQoSData is not nullptr, Neuron Runtime would use it as the profiled QoS data.

Parameters:

runtime – The address of the created neuron runtime instance.
qosOption – The option for QoS configuration.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle). Pass back the expected buffer size (byte) in *size for the tensor which holds the specified input handle.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputRank(void *runtime, uint64_t handle, uint32_t *rank)

Get the rank required by the input tensor (specified by handle). Pass back the expected rank in *rank for the tensor which holds the specified input handle.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
rank – The input rank.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputSize(void *runtime, size_t *size)

If there is only one input, this function can get the physical size required by the buffer of input and return the expected buffer size (byte) in *size. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified input handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified input when suppressInputConversion is enabled.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputPaddedSize(void *runtime, size_t *size)

If there is only one input, this function passes back the expected size (byte) of its buffer in *size. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for input when suppressInputConversion is enabled. Otherwise, the returned value is NEURONRUNTIME_INCOMPLETE.

Parameters:

runtime – The address of the created neuron runtime instance.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the input tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified input handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressInputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified input.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputPaddedDimensions(void *runtime, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the only input. This function passes back the expected size (in pixels) of each dimensions in *dim. The sizes of each dimensions in *dim have been aligned to hardware required sizes. If suppressInputConversion is enabled, the values in *dim are the required sizes of each dimensions for input. Otherwise NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle). This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputSize(void *runtime, size_t *size)

Get the physical size required by the buffer of the only output. If there is only one Output, this function passes back the expected size (byte) of its buffer in *size. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified output when suppressOutputConversion is enabled.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputPaddedSize(void *runtime, size_t *size)

Get the physical size required by the buffer of the only output with hardware alignments. If there is only one Output, this function passes back the expected size (byte) of its buffer in *size. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for output when suppressOutputConversion is enabled. Otherwise, the returned value is NEURONRUNTIME_INCOMPLETE.

Parameters:

runtime – The address of the created neuron runtime instance.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the output tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified output handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressOutputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified output.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputPaddedDimensions(void *runtime, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the only output. If there is only one Output, this function passes back the expected size (in pixels) of each dimensions in *dim. The sizes of each dimensions in *dim have been aligned to hardware required sizes. If suppressOutputConversion is enabled, the values in *dim are the required sizes of each dimensions for output. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:

runtime – The address of the created neuron runtime instance.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getProfiledQoSData(void *runtime, ProfiledQoSData **profiledQoSData, uint8_t *execBoostValue)

Get the profiled QoS data and executing boost value (the actual boost value during execution). If *profiledQoSData is nullptr, Neuron Runtime would allocate *profiledQoSData. Otherwise, Neuron Runtime would only update its fields. *profiledQoSData is actually allocated as a smart pointer in Neuron Runtime instance, so the lifetime of *profiledQoSData is the same as Neuron Runtime. Caller should be careful about the usage of *profiledQoSData, and never touch the allocated *profiledQoSData after NeuronRuntime_release.

Note

This function is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Parameters:

runtime – The address of the created neuron runtime instance.
profiledQoSData – The profiled QoS raw data.
execBoostValue – The executing boost value (the actual boot value set in device) based on scheduling policy.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_inference(void *runtime)

Do inference.

Parameters:: runtime – The address of the created neuron runtime instance.
Returns:: A RuntimeAPI error code.

void NeuronRuntime_release(void *runtime)

Release the runtime resource.

Parameters:: runtime – The address of the created neuron runtime instance.

int NeuronRuntime_getMetadataInfo(void *runtime, const char *key, size_t *size)

Get metadata info in dla file, which is provided through compiler option —dla-metadata.

Parameters:

runtime – The address of the created neuron runtime instance.
key – The key for the target data
size – The size of the target data. If there is no corresponding metadata, size is 0.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getMetadata(void *runtime, const char *key, char *data, size_t size)

Get metadata in dla file, which is provided through compiler option —dla-metadata.

Parameters:

runtime – The address of the created neuron runtime instance.
key – The key for the target data
data – The destination data buffer.
size – The size to read from metadata.

Returns:

A RuntimeAPI error code.

Variables

const unsigned char kEnvOptNullDevice = 1 << 0: For unsigned char deviceKind.

const unsigned char kEnvOptCModelDevice = 1 << 1

const unsigned char kEnvOptHardware = 1 << 2

const unsigned char kEnvOptPredictor = 1 << 3

RuntimeV2.h

Note

This file is for NeuroPilot 5.x.

struct AsyncInferenceRequest

#include <RuntimeV2.h>

AsyncInferenceRequest represents a single inference request to be enqueued into Runtime Note that all the data pointed by pointers in AsyncInferenceRequest must remain valid until the inference of that request is complete.

Public Members

IOBuffer *inputs: A pointer to the array of input buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getInputNumber();

IOBuffer *outputs: A pointer to the array of output buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getOutputNumber();

void (*finish_cb)(uint64_t job_id, void *opaque, int status): A callback function specified by the user for the runtime to notify inference complete. When it’s called, the ID of the job just have finished and the opaque pointer in the original request will be passed back in ‘job_id’ and ‘opaque’. The execution status is given by ‘status’. A zero status indicates success. Otherwise, the inference job has failed.

void *opaque: A pointer to an opaque data, which will be passed back when finish_cb is called.

struct IOBuffer

#include <RuntimeV2.h>

IOBuffer is a descriptor describing the buffer which will be used as an inference input or output. Users should zero the whole IOBuffer, then fill those fields with valid data.

Public Members

void *buffer

size_t length

int fd

int offset

uint32_t reserved1_should_be_init_zero

uint64_t reserved2_should_be_init_zero

uint64_t reserved3_should_be_init_zero

struct SyncInferenceRequest

#include <RuntimeV2.h>

SyncInferenceRequest represents a synchronous inference request to run in the Runtime. The call will block until the inference finishes.

Public Members

IOBuffer *inputs: A pointer to the array of input buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getInputNumber();

IOBuffer *outputs: A pointer to the array of output buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getOutputNumber();

file RuntimeV2.h

#include “Types.h”

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

RuntimeV2.

NeuronRuntimeV2 API allows user to create a NeuronRuntimeV2 from the specified .DLA file. Users can enqueue asynchronous inference requests into the created runtime. Or, users can issue conventional synchronous requests, too.

Functions

int NeuronRuntimeV2_create(const char *pathToDlaFile, size_t nbThreads, void **runtime, size_t backlog)

Create a NeuronRuntimeV2 based on the setting specified in options. It acts as a thread pool, waiting to accept AsyncInferenceRequest or SyncInferenceRequest on a DLA file. When the runtime receives a request, it enqueues the request into its backlog ring buffer, and the internal load balancer will dispatch the request to the appropriate thread for execution. However, there is no guarantee on the order of completion of AsyncInferenceRequest. The user-specified callback should be aware of this. SyncInferenceRequest, on the other hand, always block until the request finishes. The address of the created runtime instance will be passed back in *runtime.

Parameters:

pathToDlaFile – The DLA file path.
nbThreads – The number of threads in the runtime.
runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.
backlog – The maximum size of the backlog ring buffer. In most cases, using 2048 is enough.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_create_with_options(const char *pathToDlaFile, size_t nbThreads, void **runtime, size_t backlog, const char *options)

Like NeuronRuntimeV2_create(), but it takes an additional option string.

Parameters:

pathToDlaFile – The DLA file path.
nbThreads – The number of threads in the runtime. Large value for ‘nbThread’ could result in a large memory footprint. ‘nbThread’ is the number of working threads and each thread would maintain its own working buffer, so the total memory footprint of all threads could be large.
runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.
backlog – The maximum size of the backlog ring buffer. In most cases, using 2048 is enough.
options – A null-terminated C-string specifying runtime options.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_createFromBuffer(const void *buffer, size_t len, size_t nbThreads, void **runtime, size_t backlog)

Like NeuronRuntimeV2_create(), but it creates the Runtime instance from a memory buffer containing the DLA data.

Parameters:

buffer – The DLA data buffer.
len – The DLA data buffer size.
nbThreads – The number of threads in the runtime. Large value for ‘nbThread’ could result in a large memory footprint. ‘nbThread’ is the number of working threads and each thread would maintain its own working buffer, so the total memory footprint of all threads could be large.
runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.
backlog – The maximum size of the backlog ring buffer. In most cases, using 2048 is enough.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_createFromBuffer_with_options(const void *buffer, size_t len, size_t nbThreads, void **runtime, size_t backlog, const char *options)

Like NeuronRuntimeV2_createFromBuffer(), but it takes an additional option string. containing the DLA data.

Parameters:

buffer – The DLA data buffer.
len – The DLA data buffer size.
nbThreads – The number of threads in the runtime. Large value for ‘nbThread’ could result in a large memory footprint. ‘nbThread’ is the number of working threads and each thread would maintain its own working buffer, so the total memory footprint of all threads could be large.
runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.
backlog – The maximum size of the backlog ring buffer. In most cases, using 2048 is enough.
options – A null-terminated C-string specifying runtime options.

Returns:

A RuntimeAPI error code.

void NeuronRuntimeV2_release(void *runtime)

Release the runtime. Calling this function will block until all requests finish.

Parameters:: runtime – The address of the created NeuronRuntimeV2 instance.

int NeuronRuntimeV2_enqueue(void *runtime, AsyncInferenceRequest request, uint64_t *job_id)

Enqueue one AsyncInferenceRequest. If the backlog ring buffer is not full, this function returns immediately, and the runtime will execute the request asynchronously. If the backlog is full (due to back pressure from execution), this call will block until the backlog ring buffer releases at least one available slot for the request. A unique ID is returned for the enqueued request in *job_id. The ID sequence starts from zero and increases with each received request. The 2^64 capacity for job ID should be enough for any applications.

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
request – The asynchronous inference request
job_id – The ID for this request is filled into *job_id. Later the ID will be passed back when the finish_cb is called.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_run(void *runtime, SyncInferenceRequest request)

Perform a synchronous inference request. The request will be also enqueued into the Runtime ring buffer as NeuronRuntimeV2_enqueue() does. However, the call will block until the request finishes.

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
request – The synchronous inference request

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputNumber(void *runtime, size_t *size)

Get the number of inputs of the model in the runtime. The number of inputs will be passed back in *size

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
size – The pointer to a size_t to store the passed back value.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputNumber(void *runtime, size_t *size)

Get the number of outputs of the model in the runtime. The number of outputs will be passed back in *size

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
size – The pointer to a size_t to store the passed back value.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputRank(void *runtime, uint64_t handle, uint32_t *rank)

Get the rank required by the input tensor (specified by handle). Pass back the expected rank in *rank for the tensor which holds the specified input handle.

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
rank – The input rank.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle). Pass back the expected buffer size (byte) in *size for the tensor which holds the specified input handle.

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
handle – The frontend IO index.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle). This funxtion passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle.

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.
handle – The frontend IO index.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_setInputShape(void *runtime, uint64_t handle, uint32_t *dims, uint32_t rank)

Parameters:

runtime – The address of the created neuron runtime instance.
handle – The frontend IO index.
dims – A array of dimension sizes for each dimension. For NHWC, dims[0] is N.
rank – The input rank. For exmaple, rank is 4 for NHWC.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_setQoSOption(void *runtime, const QoSOptions *qosOption)

Set the QoS configuration for Neuron Runtime. If qosOption.profiledQoSData is not null, Neuron Runtime would use it to store the profiled QoS data. *** Note : qosOption.profiledQoSData has no effect at all. *** Note : Using this API when NeuronRuntimeV2 is working leads to undefined behavior. Namely, this API should be used only when all requests have finished and no new request is being issued.

Parameters:

runtime – The address of the created neuron runtime instance.
qosOption – The option for QoS configuration.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getProfiledQoSData(void *runtime, ProfiledQoSData **profiledQoSData, uint8_t *execBoostValue)

*** Note : Only effective when NeuronRuntimeV2 has nbThreads = 1. *** Note : Using this API when NeuronRuntimeV2 is working leads to undefined behavior. Namely, this API should be used only when all requests have finished and no new request is being issued.

Note

This function is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Parameters:

runtime – The address of the created neuron runtime instance.
profiledQoSData – The profiled QoS raw data.
execBoostValue – The executing boost value (the actual boot value set in device) based on scheduling policy.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getMetadataInfo(void *runtime, const char *key, size_t *size)

Get metadata info in dla file, which is provided through compiler option —dla-metadata.

Parameters:

runtime – The address of the created neuron runtime instance.
key – The key for the target data
size – The size of the target data. If there is no corresponding metadata, size is 0.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getMetadata(void *runtime, const char *key, char *data, size_t size)

Get metadata in dla file, which is provided through compiler option —dla-metadata.

Parameters:

runtime – The address of the created neuron runtime instance.
key – The key for the target data
data – The destination data buffer.
size – The size to read from metadata.

Returns:

A RuntimeAPI error code.

Types.h

struct NeuronVersion

#include <Types.h>

The structure to represent the neuron version.

Public Members

uint8_t major

uint8_t minor

uint8_t patch

struct ProfiledQoSData

#include <Types.h>

Maintain the profiled QoS raw data.

Public Members

QoSData **qosData

Maintain profiled QoS raw data in a pointer of pointer.

This field could be nullptr if there is no previous profiled data.

uint32_t *numSubCmd

Number of sub-command in *qosData.

This field could be nullptr if there is no previous profiled data.

uint32_t numSubgraph

Number of subgraph.

This field should be zero if there is no previous profiled data.

struct QoSData

#include <Types.h>

Raw data for QoS configuration. All of those fields should be filled with the profiled data.

Note

This struct is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Public Members

uint64_t execTime: Profiled execution time : the profiled execution time (in usec).

uint32_t suggestedTime: Suggested time : the suggested time (in msec).

uint32_t bandwidth: Profled bandwidh : the profiled bandwidh (in MB/s).

uint8_t boostValue: Profiled boost value : the profiled executing boost value (range in 0 to 100).

struct QoSOptions

#include <Types.h>

QoS Option for configuration.

Public Members

RuntimeAPIQoSPreference preference

Execution preference

NEURONRUNTIME_PREFER_PERFORMANCE, NEURONRUNTIME_PREFER_POWER, or NEURONRUNTIME_TURBO_BOOST.

RuntimeAPIQoSPriority priority

Task priority

NEURONRUNTIME_PRIORITY_HIGH, NEURONRUNTIME_PRIORITY_MED, or NEURONRUNTIME_PRIORITY_LOW.

uint8_t boostValue: Boost value hint: hint for the device frequency, ranged between 0 (lowest) to 100 (highest). This value is the hint for baseline boost value in the scheduler, which sets the executing boost value (the actual boot value set in device) based on scheduling policy. For the inferences with preference set as NEURONRUNTIME_PREFER_PERFORMANCE, scheduler guarantees that the executing boost value would not be lower than the boost value hint. On the other hand, for the inferences with preference set as NEURONRUNTIME_PREFER_POWER, scheduler would try to save power by configuring the executing boost value with some value that is not higher than the boost value hint.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

uint8_t maxBoostValue: Maximum boost value: reserved. Assign 0 to this field by default.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

uint8_t minBoostValue: Minimum boost value: reserved. Assign 0 to this field by default.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

uint16_t deadline: Deadline: deadline for the inference (in msec). Setting any non-zero value would nofity the scheduler that this inference is a real-time task. This field should be zero, unless this inference is a real-time task.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

uint16_t abortTime: Abort time: the maximum inference time for the inference (in msec). If the inference is not completed before the abort time, the scheduler would abort the inference. This field should be zero, unless you wish to abort the inference.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

int32_t delayedPowerOffTime: Delayed power off time: delayed power off time after inference completed (in msec). Scheduler would start a timer for the time interval defined in delayed power off time after the inference completion. Once the delayed power off time expired and there is no other incoming inference requests, the underlying devices would be powered off for power-saving purpose. Set this field to NEURONRUNTIME_POWER_OFF_TIME_DEFAULT to use the default power off policy in the scheduler.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

RuntimeAPIQoSPowerPolicy powerPolicy: Power policy: configure power policy for scheduler.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

RuntimeAPIQoSAppType applicationType: Application type: hint for the application type for the inference.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

ProfiledQoSData *profiledQoSData: Profiled QoS Data: pointer to the historical QoS data of previous inferences. If there is no profiled data, this field could be nullptr. For the details, please check the ProfiledQoSData part.

Note

This member is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

struct RuntimeAPIDimensions

#include <Types.h>

The aligned sizes of dimensions.

Public Members

uint32_t dimensions[RuntimeAPIDimIndex::DimensionSize]

file Types.h

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

Common type definitions.

Enums

enum RuntimeAPIDimIndex

Values:

enumerator N: Batch dimension index.

enumerator H: Height dimension index.

enumerator W: Width dimension index.

enumerator C: Channel dimension index.

enumerator Invalid

enumerator DimensionSize: Dimension size.

enum RuntimeAPIQoSPreference

Execution preference.

Note

This enum is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Values:

enumerator NEURONRUNTIME_PREFER_PERFORMANCE: Prefer performance.

enumerator NEURONRUNTIME_PREFER_POWER: Prefer low power.

enumerator NEURONRUNTIME_HINT_TURBO_BOOST: Hint for turbo boost mode. Only valid for certain platforms (e.g., DX-1), For other platforms without turbo boost mode support, the behavior of NEURONRUNTIME_HINT_TURBO_BOOST would be identical to NEURONRUNTIME_PREFER_PERFORMANCE.

enum RuntimeAPIQoSPriority

Task priority.

Values:

enumerator NEURONRUNTIME_PRIORITY_LOW: Low priority.

enumerator NEURONRUNTIME_PRIORITY_MED: Medium priority.

enumerator NEURONRUNTIME_PRIORITY_HIGH: High priority.

enum RuntimeAPIQoSBoostValue

Special boost value hint.

Note

This enum is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Values:

enumerator NEURONRUNTIME_BOOSTVALUE_PROFILED: 101: Hint to notify the scheduler to use the profiled boost value.

enumerator NEURONRUNTIME_BOOSTVALUE_MAX: 100: Maximum boost value

enumerator NEURONRUNTIME_BOOSTVALUE_MIN: 0: Minimum boost value

enum RuntimeAPIQoSDelayedPowerOffTime

Delayed power off time.

Note

This enum is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Values:

enumerator NEURONRUNTIME_POWER_OFF_TIME_DEFAULT: Default power off time.

enum RuntimeAPIQoSPowerPolicy

Power policy.

Note

This enum is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Values:

enumerator NEURONRUNTIME_POWER_POLICY_DEFAULT: Default policy.

enum RuntimeAPIQoSAppType

Application type.

Note

This enum is not supported on MediaTek TV platforms (MT99XX/MT96XX/MT76XX/MT58XX).

Values:

enumerator NEURONRUNTIME_APP_NORMAL: Normal type.

enum RuntimeAPIErrorCode

A Neuron Runtime API returns an error code to show the status of execution.

Values:

enumerator NEURONRUNTIME_NO_ERROR: 0: The API is complete successfully.

enumerator NEURONRUNTIME_OUT_OF_MEMORY: 1: Memory is not enough for the API.

enumerator NEURONRUNTIME_INCOMPLETE: 2: Not in use.

enumerator NEURONRUNTIME_UNEXPECTED_NULL: 3: A required pointer is null.

enumerator NEURONRUNTIME_BAD_DATA: 4: Failed to load data or set input/output.

enumerator NEURONRUNTIME_BAD_STATE: 5: Not in use.

enumerator NEURONRUNTIME_RUNTIME_ERROR: 6: Hardware or simulator return unexpectedly.