Currently, some fixed point operations (notably multiplication) fail to
compile when used in Thumb-mode routines. This occurs because GCC
attempts to inline the operation into the Thumb-mode routine, but the
operation uses ARM-mode only instructions. This commit adds the ".arm"
directive into the inline assembly of the implementation, which informs
GCC that the assembly uses ARM-mode instructions and prevents inlining.
As a result, fixed point numbers can be used from both ARM-mode and
Thumb-mode code without issues! Usage in ARM-mode should still be
preferred for optimal performance though.
Caused issues with ODR rule violations. Now fixed point numbers should
only be used in ARM-mode. Attempting to use them in Thumb-mode will
cause a compilation failure. This commit also moves operator/ into IWRAM
on the GBA.
Before this commit, fixed point multiplication was implemented using an
assembly routine in a separate translation unit. This commit implements
this routine directly using inline assembly. By doing so, these
operations can be inlined when called from ARM code. Fixed point
division is implemented as well, along with various documentation and
style improvements.
This addition is useful when an explicit template instantiation of
operator<< is needed, for example, when logging from an ARM mode
function. Example usage:
template mtl::log::stream_type& mtl::log::stream_type::operator<<
<uint16_t>(uint16_t);
ARM_MODE void foo(int16_t x) {
mtl::log::debug << x; // Without the explicit template
instantiation, an ODR violation would occur.
}
These macros are defined in target.hpp. This commit adds the macros:
NOINLINE - Never inline the function
ALWAYS_INLINE - Force the function to be inlined (also adds the inline
attribute to the function)
TARGET_ARM_MODE - Compile all future functions in ARM mode until
TARGET_END_MODE is reached. No-op if not compiling for ARM.
TARGET_THUMB_MODE - Compile all future functions in thumb mode until
TARGET_END_MODE is reached. No-op if not compiling for ARM.
TARGET_END_MODE - Undo the last TARGET_*_MODE option.
ARM_MODE - Compile this function in ARM mode.
THUMB_MODE - Compile this function in thumb mode.
Previously, each FSM class could only have one instance because of the
use of globals. This new implementation only uses memory allocated on
the stack, so multiple instances can be created at once. Dynamic
allocation is still unused. Additionally, this approach uses a more
logical separation between the FSM, states, and events.
If the buffer is cleared when flushed, the class does not function
correctly as a string builder. For example, if a string is built with a
newline inside, everything before the newline will be cleared and the
string will be incomplete. Clearing the buffer on flush only makes sense
for applications such as logging or writting to a file.
option
This commit refactors string_stream and string_streamx into a common
basic_string_stream template class. When the EXT template == true, the
string_streamx formatting options are enabled, they default to disabled.
These options are enabled at compile time, and do not affect performance
when they are disabled. By implementing the two streams in this manner,
duplicated code is removed.
This commit also adds the ENDL template paramter. When ENDL is set to
zero, no endline character is printed when piping mtl::endl. Otherwise,
the character is printed. Defaults to '\n'. This allows logging on the
GBA to handle mtl::endl correctly and not print two newlines on the MGBA
emulator.
Previously didn't add the source to the target if a source of the same
name already existed. This was janky because these files would be
considered the same: src/foo.cpp src/armv4t/bar/baz/foo.cpp, even though
they really shouldn't. What should happen instead, is that the symbols
of the architecture-specific code should not be overridden by the common
implementation regardless of where the file is placed. This means that
if the files src/foo.cpp and src/armv4t/bar.cpp contain implementations
of the function foo, the armv4t implementation will be exported
even though it uses a different filename from the common
implementation. This commit implements this behaviour by using
the way symbols are naturally resolved. Multiple smaller
libraries are built for each architecture dependent code.
Afterwards the libraries are linked into one, with the arch specific
libraries linked first.
Both assembly macros failed when given large numbers ending in 9. For
example, udiv100000 of 3999999999 produced 40000 instead of 39999.
Similarly, udiv1000000000 of 3999999999 produced 4 instead of 3.
Both of the previous implementations failed the Granlund-Montgomery
integer division algorithm. This commit replaces these macros with the
correct implementation generated by clang for a constant integer
division. I do not understand how this implementation works. All other
macros do pass the Granlund-Montgomery algorithm.
string_streamx is a string_stream with additional formatting options.
Currently the only extra option is the ability to expand strings to a
length using a fill character, along with left/right justification. More
options similar to std::stringstream may be added in the future. These
extra options do come at a performance cost, and string_stream should be
preferred unless the extra options are absolutely needed.
This implementation also writes the digits from left to right instead of
right to left. Using this method we can write the string to the
beginning of the buffer and still avoid reversing the string. It also
has the benefit of being slightly slower than the previous
implementation. The function's signature changed as well because there
is no longer a reason to pass the buffer size or a pointer to output the
start of the string.
Now uses a specialized implementation instead of the append multiple
characters implementation. Useful for appending single characters to
string streams (ex. newline).
Can't use std::exception because it dynamically allocates memory. This
implementation doesn't allocate memory, but also doesn't allow leaving
an exception message.
CONFIGURE_DEPENDS option was added in CMake v3.12. This option allows the
build system to automatically re-run CMake if the glob changes, solving the
major issue with globbing. May have a performance impact, but it should be
negligible compared to the time spent building.
The istring and string_view operators have identical implementations. By
changing the istring operators to cast to string_view and use that
implementation instead, the number of redundant implementations is
reduced. This does incurr a small performance penalty, around 15 cycles
when tested on the MGBA Gameboy Advance emulator (uses an armv7tdmi).
When compared to the time operations take, the performance difference is
negligible. Ex. An insertion with two 8 character strings takes around
450 cycles.