See the comment added in this commit for details. Currently unused.
Would save some IWRAM usage in the matrix implementation at the cost of
readability. If one operation on a matrix size is used, most other
operations will likely be used too, so in practice this may not change
IWRAM usage much. Only including the matrix sizes that are used in the
final binary would likely have a greater impact. FURTHER TESTING
REQUIRED.
This commit adds the ability to conditionally compile tests. To
implement this, common source files needed to be moved to a subdirectory
`common`.
Additionally, this commit splits each `mat` template instantiation into
a separate source file. This enables the linker to discard unused
instantations. If each instantiation is placed in the same file and also
into the .iwram section, the linker will include every symbol in the
resulting binary, leading to an extremely high IWRAM usage. To introduce
this split, a new header file "mat_impl.hpp" was added containing
implementation details for the template instantiations. "mat_impl.hpp"
should ONLY be included by the source files containing explicit
instantiations of `mat`.
This was done because with 6 bits of precision, when computing a
projection matrix error would accumulate up to 0.078. Changing the
decimal point precision to 8 bits minimizes the affect of this error,
reducing it closer to 0.016. Although, this does decrease the maximum
value from around 33,000,000 to around 8,000,000, although this
shouldn't be an issue.
We can't push/pop optimize options because they don't apply for inlined
functions. Function attributes also won't apply for inlined functions.
Because most (if not all) vector operations are inlined, neither of
these are appropriate options. However, GCC 8.1 introduces a new pragma,
unroll, that allows us to unroll specific loops. This pragma does apply
for inlined functions.
This timer is expected to be global, so this would make paralellizing
tests difficult. However, because the main target of this project (the
GBA) does not support parallelization, this is not a concern at the
moment.
This new framework does not automatically register test suites, but it
is much simpler to use. I might revisit the old approach later, but for
now this works, KISS.
Fixed point multiplication used an ARM inline assembly routine. This was
fast, but unfortunately, caused some odd attempted inlining problems
when used from Thumb-mode code. This commit replaces this assembly
routine with a C++ implementation that performs equal or better than the
assembly routine in most cases. The C++ implementation is slightly
slower when called from Thumb-mode code because GCC inlines the
operation instead of calling a standalone ARM-mode routine placed in
IWRAM. The performance tradeoff is acceptable though because of the
fixes, portability, and ARM-mode performance improvements it provides.
Currently, some fixed point operations (notably multiplication) fail to
compile when used in Thumb-mode routines. This occurs because GCC
attempts to inline the operation into the Thumb-mode routine, but the
operation uses ARM-mode only instructions. This commit adds the ".arm"
directive into the inline assembly of the implementation, which informs
GCC that the assembly uses ARM-mode instructions and prevents inlining.
As a result, fixed point numbers can be used from both ARM-mode and
Thumb-mode code without issues! Usage in ARM-mode should still be
preferred for optimal performance though.
Caused issues with ODR rule violations. Now fixed point numbers should
only be used in ARM-mode. Attempting to use them in Thumb-mode will
cause a compilation failure. This commit also moves operator/ into IWRAM
on the GBA.
Before this commit, fixed point multiplication was implemented using an
assembly routine in a separate translation unit. This commit implements
this routine directly using inline assembly. By doing so, these
operations can be inlined when called from ARM code. Fixed point
division is implemented as well, along with various documentation and
style improvements.
This addition is useful when an explicit template instantiation of
operator<< is needed, for example, when logging from an ARM mode
function. Example usage:
template mtl::log::stream_type& mtl::log::stream_type::operator<<
<uint16_t>(uint16_t);
ARM_MODE void foo(int16_t x) {
mtl::log::debug << x; // Without the explicit template
instantiation, an ODR violation would occur.
}
These macros are defined in target.hpp. This commit adds the macros:
NOINLINE - Never inline the function
ALWAYS_INLINE - Force the function to be inlined (also adds the inline
attribute to the function)
TARGET_ARM_MODE - Compile all future functions in ARM mode until
TARGET_END_MODE is reached. No-op if not compiling for ARM.
TARGET_THUMB_MODE - Compile all future functions in thumb mode until
TARGET_END_MODE is reached. No-op if not compiling for ARM.
TARGET_END_MODE - Undo the last TARGET_*_MODE option.
ARM_MODE - Compile this function in ARM mode.
THUMB_MODE - Compile this function in thumb mode.
Previously, each FSM class could only have one instance because of the
use of globals. This new implementation only uses memory allocated on
the stack, so multiple instances can be created at once. Dynamic
allocation is still unused. Additionally, this approach uses a more
logical separation between the FSM, states, and events.
If the buffer is cleared when flushed, the class does not function
correctly as a string builder. For example, if a string is built with a
newline inside, everything before the newline will be cleared and the
string will be incomplete. Clearing the buffer on flush only makes sense
for applications such as logging or writting to a file.
option
This commit refactors string_stream and string_streamx into a common
basic_string_stream template class. When the EXT template == true, the
string_streamx formatting options are enabled, they default to disabled.
These options are enabled at compile time, and do not affect performance
when they are disabled. By implementing the two streams in this manner,
duplicated code is removed.
This commit also adds the ENDL template paramter. When ENDL is set to
zero, no endline character is printed when piping mtl::endl. Otherwise,
the character is printed. Defaults to '\n'. This allows logging on the
GBA to handle mtl::endl correctly and not print two newlines on the MGBA
emulator.
Previously didn't add the source to the target if a source of the same
name already existed. This was janky because these files would be
considered the same: src/foo.cpp src/armv4t/bar/baz/foo.cpp, even though
they really shouldn't. What should happen instead, is that the symbols
of the architecture-specific code should not be overridden by the common
implementation regardless of where the file is placed. This means that
if the files src/foo.cpp and src/armv4t/bar.cpp contain implementations
of the function foo, the armv4t implementation will be exported
even though it uses a different filename from the common
implementation. This commit implements this behaviour by using
the way symbols are naturally resolved. Multiple smaller
libraries are built for each architecture dependent code.
Afterwards the libraries are linked into one, with the arch specific
libraries linked first.
Both assembly macros failed when given large numbers ending in 9. For
example, udiv100000 of 3999999999 produced 40000 instead of 39999.
Similarly, udiv1000000000 of 3999999999 produced 4 instead of 3.
Both of the previous implementations failed the Granlund-Montgomery
integer division algorithm. This commit replaces these macros with the
correct implementation generated by clang for a constant integer
division. I do not understand how this implementation works. All other
macros do pass the Granlund-Montgomery algorithm.