This breaks the shader on my Android device. When `atIndex` is out of
bounds, `atFrames[atIndex]` evaluates to 0 and so
`aniIndex / atFrames[atIndex]` results in division by 0. All of the
tiles where this division by 0 occurs are blank on my Android device.
That probably means the shader pipeline halts when division by 0 occurs.
Since portablegl.h defines OpenGL types and Khronos's OpenGL headers
also define OpenGL types, and we use both of these headers, this define
is necessary to make sure the Khronos headers don't try to define the
OpenGL types again.
PortableGL supports shaders, but requires them to be compiled
ahead-of-time to C or C++.
I'll write a compiler later to translate from GLSL to this format
automatically at build time.
This removes the need to have xxd installed and provides a portable way
to specify the name of the output array (xxd has an `-n` option for
this, but it isn't present in older versions of xxd), which helps reduce
the possibility of symbol conflicts in libretro builds and also prevents
portability issues since the name of xxd's output array depends on the
relative path to the input file, which can break if Meson changes the
structure of the build directory or if the user sets the build directory
to a different location.
Also, this custom executable declares the array as const so that it goes
into the read-only data section of the binary instead of the data
section.
Restore conditional that makes mediump declaration only apply to fragment shaders
This was erroneously removed to make the Lanczos3 shader link properly, but it resulted in vertex shaders incorrectly defaulting to mediump
Add highp precision declaration to Lanczos3 fragment shader so that it can link
Along with setting a version for macOS to use instead of 2.1,
most if not all Essentials games require a maximum texture size
of at least 16384x16384. OpenGL 4.1 requires this as a minimum.
The previously YIQ-based algorithm turned out to be both slow,
and horribly inaccurate.
Another algorithm based on rotating the color value in the
RGB cube along the diagonal axis was also considered, which was
acceptable in terms of accuracy, and very fast.
In the end, I decided on a HSV-based one, because it is by far
the most accurate one, while still being a tad faster than the
YIQ solution.
Algorithm source: gamedev.stackexchange.com/a/59808/24839
A very simple GPU time benchmark when shifting a 2048^2 bitmap:
YIQ rot RGB rot HSV shift
radeon 13.4 ms 2.8 ms 11.4 ms
intel 13.0 ms 6.0 ms 10.5 ms
radeon: HD 3650 mobility
intel: N3540 integrated (Baytrail)
However hue shifting has never shown up as a bottleneck before,
so these are more academic.
Using the kitchen sink plane shader for viewport effects, even
if only a small part of them are active, incurs great performance
loss on mobile, so split the rendering into multiple optional
passes which additionally use the blending hardware for faster
mixing (lerping).
Also, don't mirror the PingPong textures if the viewport effect
covers the entire screen area anyway.
Don't globally set float precision to mediump, only fragment
shaders need that and defining it for vertex shaders causes
tilemap cracks.
Also manually define low precision for variables that hold
color / alpha values.
Previously, we would just stuff the entire tilemap vertex data
four times into the buffers, with only the autotile vertices
offset according to the animation frame. This meant we could
prepare the buffers once, and then just bind a different offset
for each animation frame without any shader changes, but it also
lead to a huge amount of data being duplicated (and blowing up
the buffer sizes).
The new method only requires one buffer, and instead animates by
recognizing vertices belonging to autotiles in a custom vertex
shader, which offsets them on the fly according to the animation
index.
With giant tilemaps, this method would turn out to be a little
less efficient, but considering the Tilemap is planned to be
rewritten to only hold the range of tiles visible on the screen
in its buffers, the on the fly offsetting will become neglient,
while at the same time the amount of data we have to send to the
GPU everytime the tilemap is updated is greatly reduced; so a
net win in the end.
This implementation is also heaps better than the old
one as it doesn't use a (differently sized) aux texture,
meaning the Bitmap discards its old texture and aquires
one of same size, making reuse through the TexPool a
lot more likely. It also saves on the aux texture blits
and binding switches.
As the setup / resource acquisition far outweighs the
actual rendering cost, operation time is relatively
constant no matter how many divisions are used.