This commit adds the ability to conditionally compile tests. To
implement this, common source files needed to be moved to a subdirectory
`common`.
Additionally, this commit splits each `mat` template instantiation into
a separate source file. This enables the linker to discard unused
instantations. If each instantiation is placed in the same file and also
into the .iwram section, the linker will include every symbol in the
resulting binary, leading to an extremely high IWRAM usage. To introduce
this split, a new header file "mat_impl.hpp" was added containing
implementation details for the template instantiations. "mat_impl.hpp"
should ONLY be included by the source files containing explicit
instantiations of `mat`.
This was done because with 6 bits of precision, when computing a
projection matrix error would accumulate up to 0.078. Changing the
decimal point precision to 8 bits minimizes the affect of this error,
reducing it closer to 0.016. Although, this does decrease the maximum
value from around 33,000,000 to around 8,000,000, although this
shouldn't be an issue.
This new framework does not automatically register test suites, but it
is much simpler to use. I might revisit the old approach later, but for
now this works, KISS.
Currently, some fixed point operations (notably multiplication) fail to
compile when used in Thumb-mode routines. This occurs because GCC
attempts to inline the operation into the Thumb-mode routine, but the
operation uses ARM-mode only instructions. This commit adds the ".arm"
directive into the inline assembly of the implementation, which informs
GCC that the assembly uses ARM-mode instructions and prevents inlining.
As a result, fixed point numbers can be used from both ARM-mode and
Thumb-mode code without issues! Usage in ARM-mode should still be
preferred for optimal performance though.
Caused issues with ODR rule violations. Now fixed point numbers should
only be used in ARM-mode. Attempting to use them in Thumb-mode will
cause a compilation failure. This commit also moves operator/ into IWRAM
on the GBA.
Before this commit, fixed point multiplication was implemented using an
assembly routine in a separate translation unit. This commit implements
this routine directly using inline assembly. By doing so, these
operations can be inlined when called from ARM code. Fixed point
division is implemented as well, along with various documentation and
style improvements.
option
This commit refactors string_stream and string_streamx into a common
basic_string_stream template class. When the EXT template == true, the
string_streamx formatting options are enabled, they default to disabled.
These options are enabled at compile time, and do not affect performance
when they are disabled. By implementing the two streams in this manner,
duplicated code is removed.
This commit also adds the ENDL template paramter. When ENDL is set to
zero, no endline character is printed when piping mtl::endl. Otherwise,
the character is printed. Defaults to '\n'. This allows logging on the
GBA to handle mtl::endl correctly and not print two newlines on the MGBA
emulator.
This implementation also writes the digits from left to right instead of
right to left. Using this method we can write the string to the
beginning of the buffer and still avoid reversing the string. It also
has the benefit of being slightly slower than the previous
implementation. The function's signature changed as well because there
is no longer a reason to pass the buffer size or a pointer to output the
start of the string.
Now uses a specialized implementation instead of the append multiple
characters implementation. Useful for appending single characters to
string streams (ex. newline).
CONFIGURE_DEPENDS option was added in CMake v3.12. This option allows the
build system to automatically re-run CMake if the glob changes, solving the
major issue with globbing. May have a performance impact, but it should be
negligible compared to the time spent building.
The istring and string_view operators have identical implementations. By
changing the istring operators to cast to string_view and use that
implementation instead, the number of redundant implementations is
reduced. This does incurr a small performance penalty, around 15 cycles
when tested on the MGBA Gameboy Advance emulator (uses an armv7tdmi).
When compared to the time operations take, the performance difference is
negligible. Ex. An insertion with two 8 character strings takes around
450 cycles.