adderd
|
Okie dokie... Well, I got Visual Studio 2005 setup and compiled verge. Then I used CodeAnalyst to profile it. Also, I've compiled verge in linux and used gprof to profile it there as well. In both cases I've enabled normal optimizations (/O2 in both)
This is the top of the data from windows:
IP Funct PID %
"0x4be8a0","memset","","5036","3.28","5036","5036","0",
"0x4bd570","memcpy","","3164","2.06","3164","3164","0",
"0x459aa0","dd32_Blit_lucent","","3152","2.05","3152","3152","0",
"0x45a340","dd32_TBlitTile","","2738","1.78","2738","2738","0",
"0x43a730","VCCore::ProcessOperand","","2663","1.73","2663","2663","0",
"0x45cf40","dd32_AlphaBlit","","2254","1.47","2254","2254","0",
"0x457a70","dd32_HLine","","1911","1.24","1911","1911","0",
"0x4b8420","CheckBytes","","1820","1.19","1820","1820","0",
"0x43aae0","VCCore::ResolveOperand","","1804","1.17","1804","1804","0",
"0x45a200","dd32_BlitTile","","1761","1.15","1761","1761","0",
"0x4bd8e0","strncpy","","1751","1.14","1751","1751","0",
"0x4b6ec0","_heap_alloc_dbg","","1546","1.01","1546","1546","0",
"0x41a1c0","Chunk::GrabC","","1505","0.98","1505","1505","0",
"0x408f80","rawmem::resize","","1426","0.93","1426","1426","0",
"0x4b7ca0","_free_dbg_nolock","","1136","0.74","1136","1136","0",
"0x4b61b0","operator delete","","1104","0.72","1104","1104","0",
"0x4386d0","VCCore::HandleAssign","","888","0.58","888","888","0",
"0x4093b0","string::string","","780","0.51","780","780","0",
"0x426be0","std::vector<int_t *,std::allocator<int_t *>
>::operator[]","","770","0.50","770","770","0",
"0x4b7c20","_free_dbg","","756","0.49","756","756","0",
"0x41a200","Chunk::GrabD","","669","0.44","669","669","0",
The percentages above are weird because CodeAnalyst profiles the whole system not just the application. Multiply all above values by around 3 and you get the program's percentage numbers.
Linux:
% cumulative self self total
time seconds seconds calls s/call s/call name
14.16 54.00 54.00 204788829 0.00 0.00 Chunk::GrabC()
12.94 103.34 49.34 77127574 0.00 0.00 VCCore::ProcessOperand()
7.25 130.98 27.64 976422 0.00 0.00 dd32_BlitTile(int, int, char*, image*)
6.66 156.39 25.41 49439747 0.00 0.00 VCCore::ResolveOperand()
6.28 180.36 23.97 21918139 0.00 0.00 dd32_HLine(int, int, int, int, image*)
5.66 201.94 21.58 78136841 0.00 0.00 Chunk::GrabD()
4.90 220.62 18.68 27870550 0.00 0.00 rawmem::resize(int, char const*)
4.42 237.47 16.85 597936 0.00 0.00 dd32_TBlitTile(int, int, char*, image*)
3.94 252.48 15.01 6391 0.00 0.00 dd32_AlphaBlit(int, int, image*, image*, image*)
2.89 263.51 11.03 6242 0.00 0.00 dd32_Blit_lucent(int, int, image*, image*)
2.36 272.53 9.02 9601475 0.00 0.00 VCCore::HandleAssign()
2.04 280.30 7.77 40927397 0.00 0.00 rawmem::destroy()
1.55 286.21 5.91 14803057 0.00 0.00 rawmem::become_string(char const*)
1.47 291.81 5.60 12425758 0.00 0.00 VCCore::ReadInt(int, int, int)
1.42 297.23 5.42 23512247 0.00 0.00 image::GetClip(int&, int&, int&, int&)
1.32 302.25 5.02 18583379 0.00 0.00 rawmem::get(int, unsigned int) const
1.30 307.22 4.97 79846031 0.00 0.00 rawmem::length() const
1.24 311.94 4.72 13067224 0.00 0.00 rawmem::rawmem(int, char const*)
1.07 316.01 4.07 14803595 0.00 0.00 rawmem::touch(unsigned int)
1.04 319.97 3.96 219308 0.00 0.00 VCCore::ExecuteBlock()
It's interesting that the profiles are very different. You can see a general pattern though. Several functions rank up near the top in both. It's interesting that in windows two system calls (memset, memcpy) take up a lot of time and in the linux version Chunk:GrabC takes up a terribly large amount of time.
Well, sorry for the long, dry post. I can provide the whole files to anyone interested in helping to optimize verge's speed.
Posted on 2006-05-23 22:30:35
|