If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#151 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
Indeed about the champion, I wonder if it were the previous one then, which had "working ATSP", but a bit worse perturbation. First I understood we have to choose it in tandem with Bernd, but I just realised we've been submitting code "out of sync". |
|
|
|
|
|
|
#152 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Each chooses what submission he wants to position as best, and we use those numbers. Are you saying your output is incorrect for the last submission?
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
#153 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
(BRB, I need check something outside...) |
|
|
|
|
|
|
#154 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
That should have no impact given what we're reading from it...also, for smaller problems the result are accurate and good (somewhat similar with Bernd's latest, actually). Larger ATSPs are evil is a possible explanation...very scientific too!
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
#155 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
<edit: In the pdf, the largest ATSP instance I see are still marked as "[450, 500] nodes". Last edited by Naurava kulkuri; 29-Oct-2012 at 22:05. |
|
|
|
|
|
|
#156 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
I really need to go to sleep already (past midnight here, I'll be on-line in about 9 hours, I believe), but if a decision regarding my solver should be done, it should be the latest one (submission 6) if I'm to assume the ATSP isn't broken on instances smaller than, say rbg403.atsp and rbg443.atsp (judging that TSPLib instances were part of the set when Bernd debugged his solver) or that is, if the solver isn't broken for node sizes below 500.
|
|
|
|
|
|
#157 | |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Quote:
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
|
#158 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Hmm, rather strange. I took the code I submitted, compiled on my work machine and the output looks all right (I can send the log if needed).
In any event, can I just write here you take what you consider to be my best submission and run it the way you and/or Bernd like to run? I'd hate to keep you waiting between this time zoney thing. Especially since I'm not sure which one is my best submission. If I'm forced to tell something, I remember submission 5 didn't have ATSP problems and its TSP performance was acceptable. <edit: And to add, I meant before rbg403.atsp and rbg443.atsp were included in the set "still functioning" (to clarify). Last edited by Naurava kulkuri; 30-Oct-2012 at 13:44. |
|
|
|
|
|
#159 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
To add here, Bernd, if you want to ask about the code or approach (when it comes available), go ahead. I answer to the best of my ability either here or elsewhere.
Funny you happened to mention utility functions, as I've been viewing stuff like http://www.ceet.niu.edu/faculty/ghrayeb/IENG576s04/papers/Local%20Search/local%20search%20for%20tsp.pdf http://www.cleveralgorithms.com/nature-inspired/stochastic/guided_local_search.html ants and alike to supplement my code. But as said, my approach is currently really rather brute-forcing, albeit quite quickly. |
|
|
|
|
|
#160 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
Thanks, Veikko!
We should exchange notes after all this is done. I'll be travelling the next few days (as you did last week). Don't know when I will be able to check my emails, though. In the worst case you guys won't hear from me until Sun 11/4. |
|
|
|
|
|
#161 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
I have a little bit time now, so I'll just start...
My solution uses five different algorithms depending on the size of the data set and ATSP versus TSP (that's where the tweaking came in): 1. Floyd-Warshall with path construction 2. Nearest Neighbor matrix scanning 3. 2-opt 4. GA (Generational Algorithm) 5. TSP to ATSP to TSP converter and diving into more details: 1. Floyd-Warshall with path construction I took that AMP C++ sample, which uses Floyd to detect Transitive Closures and redesigned it for detecting Hamilton Cycles (TSP!) with path reconstruction: Transitive Closure Sample in C++ AMP, http://blogs.msdn.com/b/nativeconcur...-in-c-amp.aspx The standard Floyd algorithm doesn't reconstruct paths. But as always, Wikipedia was my friend: http://en.wikipedia.org/wiki/Floyd%E...reconstruction 2. Nearest Neighbor matrix scanning Once you have a Floyd optimized matrix the nearest-neighbor algorithm is fairly trivial: start with a random row, find the minimum column, mark the column as visited, make that column the next row, repeat until all columns are marked. 3. 2-opt Now, I implemented two different kinds of 2-opt algorithms. The first one is from a standard text book: I converted some Pascal code that was published in M.M.Syslo, N.Deo, J.S.Kowalik, Discrete Optimization Algorithms with Pascal Programs, Prentice-Hall, Englewood Cliffs 1983. The second one is pretty remarkable (and is perhaps the reason why I often managed to be a little bit faster). Some guys at the Technical University of Crete (TUC) wrote a paper and explained a really clever way of running multiple 2-opt swaps at the same time! Hardware Implementation of 2-Opt Local Search Algorithm for the Traveling Salesman Problem http://dl.acm.org/citation.cfm?id=1263915 http://ieeexplore.ieee.org/xpls/abs_...=4228483&tag=1 I called that algorithm 2-opt-tuc. It is not as precise as 2-opt-standard, but super fast (about 10x fater than 2-opt-standard for large data sets). Making 2-opt-tuc work with AMP C++ was a bit tricky. But I figured it out. 4. GA (Generational Algorithm) I added a Generational Algorithm solver in the last days. If there is any time left until the timeout is reached I run a Generational Algorithm based on this paper (which is otherwise completely useless and misleading): A highly-parallel TSP solver for a GPU computing platform http://dl.acm.org/citation.cfm?id=1945726 http://www.springerlink.com/content/...4/fulltext.pdf The GA part is not essential but it's better than doing nothing. 5. TSP to ATSP to TSP converter I started working on a TSP to ATSP to TSP converter at around Thursday after it became clear to me that my original idea was just not working any longer after I optimized too many other parts related to solving TSP. At Wikipedia I found this interesting section: http://en.wikipedia.org/wiki/Travell..._symmetric_TSP I thought: "So if you can convert an ATSP to a TSP matrix you can run any TSP solver on it? That would work for me!" I implemented it and it did work! Well, the solutions were all very good. But the paths were also all scrambled after the matrix transformation. I was looking at a situation like with Floyd with no path reconstruction. As far as I know nobody had come up with a way to reconstruct paths from converted ATSP to TSP matrices. But that's what I tried to do in the remaining time mostly on Sat and Sun. My algorithm seems to work for small data sets but miserably fails for bigger ones. I had reached my limits. The rest was tweaking. I had cpu and gpu versions of Floyd, 2-opt-standard, and 2-opt-tuc in my tool chest, that I mixed up freely to keep up with your solution. Some of the cpu versions are faster than my gpu solutions. If I come out as the winner of this contest, it will be 2-opt-tuc that saved my butt. If you will, Greece has bailed me out! Cheers, - Bernd Last edited by Bernd Paradies; 30-Oct-2012 at 20:42. |
|
|
|
|
|
#162 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
This will be long and I'm rather sleepy already, so please, do bear with me.
To make it short, a rather detailed description on what was the baseline for “my” solver, please see at Kamil Rocki’s page at http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/projects.html and a somewhat shorter introduction at http://gpuscience.com/articles/logo-gpu-accelerated-travelling-salesman-problem-tsp-solver/. I did of course ask for Kamil’s permission to make a port of his solver and ended up with something that wasn’t quite a straight port. My original contest submission notes and the disclaimer at \headers\opt2kernel.h (see when the code becomes available) have also appropriate texts and links, but perhaps a bit more explanation is in order before I explain what more there is in my code. When I learned about the competition on C++ AMP blog, I had just recently decided to teach myself GPU programming and use C++ AMP as a vehicle to do it. At the same time, it was also a good opportunity to check the new things coming in the new C++ standard – as my serious C++ programming dates to C++98 and earlier. In any event, besides writing an algorithm to solve a bitonic TSP at a university algorithms course sufficiently long time ago so that I really didn’t recall any specifics, I made a few baseline searches on the problem domain. The main constraints in my search were that the approaches I commit myself to needs to be something I understand (sufficiently so) and the approaches would be appropriate to the context of this contest, i.e. problem domain size and symmetricy. Some more constraints were that I should time to implement the code even as we were having a baby during the contest and there’s regular, daily work to do. I found out quite a few articles on ant colonies, genes and even particle swarms and so on (you catch the drift, I presume), but I had to discard these choices as most of the time I couldn’t understand what would make them tick and what would be the strategy to make them really fast (just using a GPU won’t cut it and particle swarms seemed just plain inefficient). The different opt approaches seemed to be a common theme and then iterated local search another, both of which I could understand and feel they are “tweakable”, and most of all, contained tight loops I could see how to turn into GPU code. Yeah! Something that must really eat cycles on the CPU and is an essential component! It was then when I found Kamil Rocki’s and Reiji Suda’s poster draft being showcased on a HPC conference promising insight on the very issue I was contemplating to take on. The trouble was, back then, it was very much a preliminary poster draft and it even cost money to download it! So, being a bit cautious as research papers are always too good to be true (I think we agree on this somewhat), I sent e-mail to Kamil and asked more about the draft before I’d pay for it. He replied and we exchanged a few thoughts and he asked if I liked to see the then code in progress. With a bit of hesitation, I wrote that sure, I just write it then to the submission notes and to the code and when I probably get disqualified, I’ve got what I was after the most anyway: experience and knowledge. Now, it turns out, this “brute-forcing” really gave a good run for your algorithm! Now to the differences in my solver to the one, Logo, Kamil wrote. You need to read Kamil’s material to understand how the core ideas work. Logo is written in CUDA having two kernels, one for small problems and one for potentially huge ones. I ported only the one for small kernels, which is at most for 4096 cities when using integer coordinates and cost matrix calculated on the fly on the GPU. I contemplated of transforming the distance matrix in symmetric case to coordinates and then using a straight port, but then for various reasons I decided against it (amongst them that I have an AMD Radeon HD 7850, so difficult to run the original code and weed out the lingering bugs in Logo code or mine and that I had to already use time to learn both C++ AMP and CUDA). Currently just I copy whole NxN matrices to GPU even in symmetric cases. I’m very much aware data transfer costs are considerable in this approach. But anyway. Then I also modified the “small kernel” so that it would accommodate different sizes of problems better (the Logo version is/was -- I used older code than is currently available -- a bit rigid in this regard) cost matrix and route data within GPU memory limits and changed the indexing scheme used to support this case. This meant forgoing significantly computing power in form of not using tile_static. There were other issues too, like C++ AMP and CUDA handle interlocked exchanges or other more subtle things, but the main gist on the GPU was as just described. Then on the CPU side of things I added support for the different problem sizes in \headers\kernel_context.h (which is a bit messy currently). I also implemented a simple FNV-1a hasher to preclude routes from re-computation, which would happen easily especially on smaller problem domains. You see in the code some traces of other ideas too, like using multiple GPUs or CPUs in parallel. One was to run ant colonies on the CPU and 2-opt on the GPU, but I run out of time. I also experimented shortly with the idea of using std: You may be aware that 2-opt won’t work on asymmetric cases straight away. For a moment I was a bit loss how to solve that one. As it was, I was brutally short of hours during the whole project (one has to prioritize and family is always first) and then it dawned to me that I could just do the usual 2-opt and check that also the “reverse” shortens the route. That way, on average, the change should work both ways. Depending on the “asymmetricy degree” (for the lack of better term, but I read domination numbers and stuff regarding this One of the difficult issues was finding a perturbation that did help out from local optima, but not too far out. As you may know already and I discovered from the research papers during this contest, good routes usually come near each other. Ants or genes (that is, find routes and crunch them straight with 2-opts) would have been one way to handle this, but time budget was there again. I did add some papers and notes on how to do a “perfect perturbation”, but that was the furthest I could get with my skills and knowledge. As to your idea on ATSP to TSP transformation, I thought about that too and for the reasons you discovered, decided against trying to implement it. I also thought about running Floyd-Warshall/Dijikstra from multiple nodes, but didn’t quite reach the conclusion how I could make it run fast with my already fast 2-opt and especially within my time budget (which was essentially thinking stuff during going work and coming back and frantically cranking code for an hour or two during an evening/midnight). That was it, I think. I urge you to check Kamil’s papers to get a better idea. The GPU science link gives you also some numbers, but here’s something Kamil told me: "For example, in pla7397 using MF heuristic I get an initial tour of 115.88% and after one 2-opt run (which takes 0.87s) it's 106.33% (see the paper)" (this made me suspect, by the way, that I miss some 2-opt exchanges, since my code obviously won't converge that fast, or then MF just gets into a superb starting route). At one point I also implemented a NN heuristic. I avoided looking at how it was done in Logo solver and ended up with something a bit different. K-d trees, Voronoi diagrams, don’t look bits and all the jazz were also on the table, but I didn’t quite have the time to study and implement them either. So, essentially, my solver is pretty much brute-forcing. Some of my code is better (maybe even good), some is crummier. I hope most of it is clear enough to follow through. As I wrote in my original submission notes, someone ought to lure Kamil from academia to earn big bucks and and do the same what he's doing now, good research. He is very smart (obviously) and a forthcoming and courteous man. Last edited by Naurava kulkuri; 30-Oct-2012 at 21:56. |
|
|
|
|
|
#163 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
And to add on tweaking and stuff, I thought about doing the warm-ups and stuff before starting calculate and hope they would be excluded from timing, but then I didn't bother. As an added note, the timer I use is made with C++11, which on Windows shows longer times. The header has some MS Connect links in this regard.
|
|
|
|
|
|
#164 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
Kamil Rocki? He is the guy I was mentioning, who "borrowed" in my opinion the indexing scheme from Noriyuki Fujimoto and Shigeyoshi Tsutsui's paper: Fast QAP Solving by ACO with 2-opt Local Search on a GPU - see figure #5.
Rocki's source code has a bug. The Greeks have the same, but the impact is less. The bug is that Rocki and the Greeks intuitively reverse the nodes between the candidates. Instead the reversal has to be done inside out. Here is a quick test that Rocki fails: in the testing function he calculates a delta. If the delta is < 0 then the tour will be reduced. Now, if you rearrange the nodes he does then the delta is not showing up as the gain. In other words: (oldTourLength + delta) != newTourLength; (delta is < 0, so newTourLength should be smaller than oldTourLength). The Pascal code I took does it right. I was first also doing it like Rocki and the Greeks in my gpu version of 2-opt. But then I realized the bug. The correct 2-opt algorithm should always yield oldTourLength + delta == newTourLength; By doing so you can save a lot of time, because you don't have to recalculate the new tour length. newTourLength = oldTourLength + delta; instead of: newTourLength = getTourLength(tour); At least that's what I found out. Gotta run to the airport! Will check in a few days. Thanks for sharing the details on your solution! Cheers, - Bernd |
|
|
|
|
|
#165 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
(While my wife lets the dog out...)
Two more things: - Rocki's source code is GPL licensed, which made me nervous, because my code had to be MIT. - the net result of Rocki's way of doing 2-opt is that his chains are not as optimal as the ones that I calculated with newTourLength = oldTourLength + delta. My wife is getting the car - gotta run! Cheers, - Bernd |
|
|
|
|
|
#166 | ||
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
Maybe there's even a note left in my code, I can't remember now. For what's worth it, I e-mailed to Kamil on this regard and he replied that in the Logo code there is a function to calculate the swap effect without going through the whole route. I didn't make use of it. I had decided that in this second, extended round, I wouldn't even take a peek on the Logo solver in order not to take anything from there (even accidentally, to have a fairer fight and to have a feeling I'd accomplished something by myself Quote:
I came too late to this idea, but last week (or I think so), I discovered Google's JavaScript solver does something like that. See at https://groups.google.com/forum/?fro...er/pplU8TNr8sY and the actual implementation at http://code.google.com/p/google-maps...BpTspSolver.js. Last edited by Naurava kulkuri; 31-Oct-2012 at 07:04. |
||
|
|
|
|
|
#167 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
I'd like to win too, of course, but being this close and finishing on second place in a competition like this against a competitor like you wouldn't be a shame either. This wasn't an easy compeition, I believe, as the early choices were the most crucial ones and there were a lot of technicalities along the way. Though this was the first competition of this sort I've ever took part in. When you wrote about using "optimizers" and a bit later when it became clear our performance signatures are a close match and I wrote we can "target each other's solutions", I was rather sure this could end up either way as our performance part (for the lack of better term) is basically the same. Judging from what you wrote, your solution feels more elaborate. I went for a simpler route and just preclude already known paths and tuned perturbation. But that we both know already. Last edited by Naurava kulkuri; 31-Oct-2012 at 16:45. |
|
|
|
|
|
|
#168 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Apologies gentlemen, my life has been placed on a plane for the past day, so I was a bit detached from the world of the internetz. I see you had a very good exchange about your respective solutions.
Veikko: curious, is your work machine by chance Windows 7? Also, what driver revision are you using? The "breakage" has been verified on a Cayman powered rig as well (still Win8 with latest drivers though) - as has the correct behaviour for the prior submissions. So it's either a case of something quirky happening between the Win8 infrastructure and the drivers and the hardware, or it's a case of faster hardware breaking something iterative (f.e. you iterate past a critical point and instead of continuing to go in the direction of an optimum, you shoot off; if stopping conditions aren't robust enough, this can happen (has happened to me)). Note that there has been no change in the testing set whatsoever. On a more general scope, we are compiling the final grades et al. and will hopefully be done by Sunday. Since you gentlemen seemed to both agree on there being an arbitrarily set time limit that you must abide, we shall take a final round of measurements for the final grade, under imposed time constraints. Cheers!
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
#169 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
The stopping condition is in every case the number of cities in route and iterating through that. The only reason I can think of is that the route cost gets something weird and consequently the tour won't get altered -- and cost changed. If you want to get to bottom of this, you can e-mail me if you have further information, so that I can run furter checks. Also, there's something I can say you can change in the code and see if it has an impact. It's a bit strange, though, since ATSP uses the same routine as TSP (there are only a few places they differ) and likely the TSP case should be broken too -- that'd narrow the cases considerably. Did Bernd's code get broken by a similar, yet undetermined reason or because of last momement tweaks? |
|
|
|
|
|
|
#170 |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
I too may be somewhat disconnected from the Internet until next week or so.
|
|
|
|
|
|
#171 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
Gentlemen, it looks like I have a stable Internet connection. I am now reading through the emails I missed...
Hey Veikko, I am kind of in your neighborhood: Hamburg, Germany. My best friend is turning 50 and I am visiting him and his family for a few days. Alex, where do you live? - Bernd |
|
|
|
|
|
#172 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
@Veikko: I honestly don't know who is ahead at this point. I may have a more elaborate setup for TSP. But I think you came up with a much more robust solution for solving ATSP. Yes, I ran into the same thing: 2-opt can't be used for ATSP. Not sure whether I understand how you got 2-opt to work with ATSP. Sounds fascinating!
Same here: I won't feel bad if I make second place. It was a tough fight and I think we both gave our best. Let's see and wait until the verdict comes in. Cheers, - Bernd |
|
|
|
|
|
#173 | |
|
Junior Member
Join Date: Aug 2012
Posts: 67
|
Quote:
Otherwise it doesn't look like I'd have friends having their fifties, but some stag parties (bachelor parties) very soon. |
|
|
|
|
|
|
#174 |
|
Junior Member
Join Date: Sep 2012
Posts: 54
|
Ha, that's funny! My last name is pretty rare in German. I decided to extend my stay here in Hamburg for another week. My company has an office here.
My Internet connection is great, too. GMT+1 is not that bad either. Cheers, - Bernd |
|
|
|
|
|
#175 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Sorry to keep you gentlemen in suspense, still working on it! If you will stick around (please do!), you'll notice that nothing is precisely on time in the land of B3D
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|